WO2007007263A2 - Audio encoding and decoding - Google Patents
Audio encoding and decoding Download PDFInfo
- Publication number
- WO2007007263A2 WO2007007263A2 PCT/IB2006/052309 IB2006052309W WO2007007263A2 WO 2007007263 A2 WO2007007263 A2 WO 2007007263A2 IB 2006052309 W IB2006052309 W IB 2006052309W WO 2007007263 A2 WO2007007263 A2 WO 2007007263A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- hierarchical
- decoder
- channel
- data stream
- Prior art date
Links
- 238000000547 structure data Methods 0.000 claims abstract description 106
- 230000004044 response Effects 0.000 claims abstract description 39
- 239000011159 matrix material Substances 0.000 claims description 80
- 238000000034 method Methods 0.000 claims description 36
- 230000005236 sound signal Effects 0.000 claims description 26
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000001419 dependent effect Effects 0.000 claims description 2
- 239000010410 layer Substances 0.000 description 77
- 230000000875 corresponding effect Effects 0.000 description 23
- 238000013459 approach Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 12
- 239000000203 mixture Substances 0.000 description 10
- 239000013598 vector Substances 0.000 description 9
- UHZZMRAGKVHANO-UHFFFAOYSA-M chlormequat chloride Chemical compound [Cl-].C[N+](C)(C)CCCl UHZZMRAGKVHANO-UHFFFAOYSA-M 0.000 description 6
- 230000006978 adaptation Effects 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the invention relates to audio encoding and/or decoding using hierarchical encoding structures and/or hierarchical decoder structures.
- an audio signal may be converted into another format to provide an enhanced user experience.
- traditional stereo recordings only comprise two channels whereas modern advanced audio systems typically use five or six channels, as in the popular 5.1 surround sound systems. Accordingly, the two stereo channels may be converted into five or six channels in order to take full advantage of the advanced audio system.
- stereo audio signals can be encoded as single channel audio signals combined with a parameter bit stream describing the spatial properties of the audio signal.
- the decoder can reproduce the stereo audio signals with a very satisfactory degree of accuracy. In this way, substantial bit rate savings may be obtained.
- inter-channel cross-correlation such as the cross-correlation between the left channel and the right channel for stereo signals.
- Another parameter is the power ratio of the channels.
- (parametric) spatial audio (en)coders these and other parameters are extracted from the original audio signal so as to produce an audio signal having a reduced number of channels, for example only a single channel, plus a set of parameters describing the spatial properties of the original audio signal.
- (parametric) spatial audio decoders the original audio signal is reconstructed. Spatial Audio Coding is a recently introduced technique to efficiently code multi-channel audio material.
- an M-channel audio signal is described as an N-channel audio signal plus a set of corresponding spatial parameters where N is typically smaller than M.
- the M-channel signal is down-mixed to an N-channel signal and the spatial parameters are extracted.
- the N-channel signal and the spatial parameters are employed to (perceptually) reconstruct the M-channel signal.
- Such spatial audio coding preferably employs a cascaded or tree-based hierarchical structure comprising standard units in the encoder and the decoder.
- these standard units can be down-mixers combining channels into a lower number of channels such as 2-to-l, 3-to-l, 3-to-2, etc. down-mixers, while in the decoder corresponding standard units can be up-mixers splitting channels into a higher number of channels such as l-to-2, 2-to-3 up-mixers.
- an improved system would be advantageous and in particular a system allowing increased flexibility, reduced complexity and/or improved performance would be advantageous.
- an apparatus for generating a number of output audio channels comprising: means for receiving a data stream comprising a number of input audio channels and parametric audio data; the data stream further comprising decoder tree structure data for a hierarchical decoder structure, the decoder tree structure data comprising at least one data value indicative of channel split characteristics for an audio channel at a hierarchical layer of the hierarchical decoder structure; means for generating the hierarchical decoder structure in response to the decoder tree structure data; and means for generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- the invention may allow a flexible generation of audio channels and may in particular allow a decoder functionality to adapt to an encoder structure used for generating the data stream.
- the invention may e.g. allow an encoder to select a suitable encoding approach for a multi-channel signal while allowing the apparatus to automatically adapt thereto.
- the invention may allow a data stream having an improved quality to bit-rate ratio.
- the invention may allow automatic adaptation and/or a high degree of flexibility while providing the improved audio quality achievable from hierarchical encoding/ decoding structures.
- the invention may furthermore allow an efficient communication of information of the hierarchical decoder structure.
- the invention may allow a low overhead for the decoder tree structure data.
- the invention may provide an apparatus which automatically adapts to the received bit-stream and which may be used with any suitable hierarchical encoding structure.
- Each audio channel may support an individual audio signal.
- the data stream may be a single bit-stream or may e.g. be a combination of a plurality of sub-bit-stream for example distributed through different distribution channels.
- the data stream may have a limited duration such as a fixed duration corresponding to a data file of a given size.
- the channel split characteristic may be a characteristic indicative of how many channels a given audio channel is split into at a hierarchical layer. For example, the channel split characteristic may reflect if a given audio channel is not divided or whether it is divided into two audio channels.
- the decoder tree structure data may comprise data for the hierarchical decoder structure of a plurality of audio channels.
- the decoder tree structure data may comprise a set of data for each of the number of input audio channels.
- the decoder tree structure data may comprise data for a decoder tree structure for each input signal.
- the decoder tree structure data comprises a plurality of data values, each data value indicative of a channel split characteristic for one channel at one hierarchical layer of the hierarchical decoder structure.
- the decoder tree structure data may specifically comprise one data value for each channel split function in the hierarchical decoder structure.
- the decoder tree structure data may also comprise one data value for each output channel indicating that no further channel splits occur for a given hierarchical layer signal.
- a predetermined data value is indicative of no channel split for the channel at the hierarchical layer.
- a predetermined data value is indicative of a one-to-two channel split for the channel at the hierarchical layer. This may provide for an efficient communication of data allowing the apparatus to effectively and reliably adapt to the encoding used for the data stream. In particular, this may allow very efficient information transfer for many hierarchical systems using low complexity standard channel split functions.
- the plurality of data values are binary data values.
- This may provide for an efficient communication of data allowing the apparatus to effectively and reliably adapt to the encoding used for the data stream.
- this may allow very efficient information transfer for systems mainly using one specific channel split functionality, such as a one-to-two channel split functionality.
- one predetermined binary data value is indicative of a one-to-two channel split and another predetermined binary data value is indicative of no channel split.
- This may provide for an efficient communication of data allowing the apparatus to effectively and reliably adapt to the encoding used for the data stream.
- this may allow very efficient information transfer for systems based around a low complexity one-to-two channel split functionality.
- An efficient decoding may be achieved by a low complexity hierarchical decoder structure which may be generated in response to low complexity data.
- the feature may allow a low overhead for the communication of decoder tree structure data and may be particularly suited for data streams encoded by a simple encoding function.
- the data stream further comprises an indication of the number of input channels.
- the means for generating the hierarchical decoder structure may do so in response to the indication of the number of input channels. For example, in many practical situations the number of input channels can be derived from the data-stream), however in some special cases the audio and parameters data may be separated. In such cases it may be beneficial if the number of input channels is known as the data stream data might have been manipulated (e.g. downmixed from stereo to mono).
- the data stream further comprises an indication of the number of output channels.
- This may facilitate the decoding and the generation of the decoding structure and/or may allow a more efficient encoding of information of the hierarchical decoder structure in the decoder tree structure data.
- the means for generating the hierarchical decoder structure may do so in response to the indication of the number of output channels.
- the indication may be used as an error check of the decoder tree structure data.
- the data stream comprises an indication of a number of one-to-two channel split functions in the hierarchical decoder structure. This may facilitate the decoding and the generation of the decoding structure and/or may allow a more efficient encoding of information of the hierarchical decoder structure in the decoder tree structure data.
- the means for generating the hierarchical decoder structure may do so in response to the indication of number of one-to- two channel split functions in the hierarchical decoder structure.
- the data stream further comprises an indication of a number of two-to-three channel split functions in the hierarchical decoder structure.
- the means for generating the hierarchical decoder structure may do so in response to the indication of the number of two- to-three channel split functions in the hierarchical decoder structure.
- the decoder tree structure data comprises a data for a plurality of decoder tree structures ordered in response to the presence of a two-to-three channel split functionality.
- the feature may allow advantageous performance in systems wherein two-to-three channel splits may only occur at the root layer.
- the means for generating the hierarchical decoder structure may first generate the two-to- three split functionality for two input channels followed by the generation of the remaining structure using only one-to-two channel split functionality.
- the remaining structure may specifically be generated in response to the binary decoder tree structure data thus reducing the required bit rate.
- the data stream may further contain information of the ordering of the plurality of decoder tree structures.
- the decoder tree structure data for at least one input channel comprises an indication of a two-to-three channel split function being present at the root layer followed by binary data where each binary data value is indicative of either no split functionality or a one-to-two channel split functionality for dependent layers of the two-to-three split functionality.
- the feature may facilitate the decoding and the generation of the decoding structure and/or may allow a more efficient encoding of information of the hierarchical decoder structure in the decoder tree structure data.
- the feature may allow advantageous performance in systems where two-to-three channel splits may only occur at the root layer.
- the means for generating the hierarchical decoder structure may first generate the two-to- three split functionality for an input channel followed by the generation of the remaining structure using only one-to-two channel split functionality.
- the remaining structure may specifically be generated in response to binary decoder tree structure data thus reducing the required bit rate.
- the data stream comprises an indication of a loudspeaker position for at least one of the output channels.
- This may allow facilitated decoding and may allow improved performance and/or adaptation of the apparatus thus providing increased flexibility.
- the means for generating the hierarchical decoder structure is arranged to determine multiplication parameters for channel split functions of the hierarchical layers in response to the decoder tree structure data.
- the feature may allow not only the hierarchical decoder structure but also the operation of the channel split functions to adapt to the received data stream.
- the multiplication parameters may be matrix multiplication parameters.
- the decoder tree structure comprises at least one channel split functionality in at least one hierarchical layer, the at least one channel split functionality comprising: de-correlation means for generating a de- correlated signal directly from an audio input channel of the data stream; at least one channel split unit for generating a plurality of hierarchical layer output channels from an audio channel from a higher hierarchical layer and the de-correlated signal; and means for determining at least one characteristic of the de-correlation filter or the channel split unit in response to the decoder tree structure data.
- the feature may allow improved performance and/or an improved adaptation/ flexibility.
- the feature may allow a hierarchical decoder structure which has improved decoding performance and which may generate output channels having increased audio quality.
- a hierarchical decoder structure wherein no de-correlation signals are generated by cascaded de-correlation filters may be achieved and dynamically and automatically adapted to the received data stream.
- the de-correlation filter receives the audio input channel of the data stream without modifications, and specifically without any prior filtering of the signal (such as by another de-correlation filter).
- the gain of the de-correlation filter may specifically be determined in response to the decoder tree structure data.
- the de-correlation means comprises a level compensation means for performing an audio level compensation on the audio input channel to generate a level compensated audio signal; and a de-correlation filter for filtering the level compensated audio signal to generate the de-correlated signal.
- the level compensation means comprises a matrix multiplication by a pre-matrix. This may allow an efficient implementation.
- the coefficients of the pre- matrix have at least one unity value for a hierarchical decoder structure comprising only one- to-two channel split functionality.
- the hierarchical decoder structure may comprise other functionality than the one-to-two channel split functionality but will in accordance with this feature not comprise any other channel split functionality.
- the apparatus further comprises means for determining the pre-matrix for the at least one channel split functionality in the at least one hierarchical layer in response to parameters of a channel split functionality in a higher hierarchical layer.
- the channel split functionality in a higher hierarchical layer may include a two-to-three channel split functionality e.g. located at the root layer of a decoder tree structure.
- the apparatus comprises means for determining a channel split matrix for the at least one channel split functionality in response to parameters of the at least one channel split functionality in the at least one hierarchical layer. This may allow efficient implementation and/or improved performance. This may be particular advantageous for hierarchical decoder tree structures comprising only one- to-two channel split functionality.
- the apparatus further comprises means for determining the pre-matrix for the at least one channel split functionality in the at least one hierarchical layer in response to parameters of a two-to-three up-mixer of a higher hierarchical layer.
- the means for determining the pre-matrix is arranged to determine the pre-matrix for the at least one channel split functionality in response to determine a first sub-pre-matrix corresponding to a first input of the two-to-three up-mixer and a second sub-pre-matrix corresponding to a second input of the two-to-three up-mixer.
- This may allow efficient implementation and/or improved performance.
- This may be particularly advantageous for hierarchical decoder tree structures comprising a two- to-three channel split functionality at the root layer of a decoder tree structure.
- an apparatus for generating a data stream comprising a number output audio channels comprising: means for receiving a number of input audio channels; hierarchical encoding means for parametrically encoding the number of input audio channels to generate the data stream comprising the number of output audio channels and parametric audio data; means for determining a hierarchical decoder structure corresponding to the hierarchical encoding means; and means for including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream.
- a data stream comprising: a number of encoded audio channels; parametric audio data; and decoder tree structure data for a hierarchical decoder structure, the decoder tree structure data comprising at least one data value indicative of channel split characteristics for audio channels at hierarchical layers of the hierarchical decoder structure.
- a storage medium having stored thereon a signal as described above.
- a method of generating a number of output audio channels comprising: receiving a data stream comprising a number of input audio channels and parametric audio data; the data stream further comprising decoder tree structure data for a hierarchical decoder structure, the decoder tree structure data comprising at least on data value indicative of channel split characteristics for an audio channel at a hierarchical layer of the hierarchical decoder structure; generating the hierarchical decoder structure in response to the decoder tree structure data; and generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- a method of generating a data stream comprising a number of output audio channels comprising: receiving a number of input audio channels; hierarchical encoding means parametrically encoding the number of input audio channels to generate the data stream comprising the number of output audio channels and parametric audio data; determining a hierarchical decoder structure corresponding to the hierarchical encoding means; and including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream.
- receiver for generating a number of output audio channels; the receiver comprising: means for receiving a data stream comprising a number of input audio channels and parametric audio data; the data stream further comprising decoder tree structure data for a hierarchical decoder structure, the decoder tree structure data comprising at least on data value indicative of channel split characteristics for an audio channel at a hierarchical layer of the hierarchical decoder structure; means for generating the hierarchical decoder structure in response to the decoder tree structure data; and means for generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- transmitter for generating a data stream comprising a number of output audio channels
- the transmitter comprising: means for receiving a number of input audio channels; hierarchical encoding means for parametrically encoding the number of input audio channels to generate the data stream comprising the number of output audio channels and parametric audio data; means for determining a hierarchical decoder structure corresponding to the hierarchical encoding means; and means for including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream.
- transmission system comprising a transmitter for generating a data stream and a receiver for generating a number of output audio channels; wherein the transmitter comprises: means for receiving a number of input audio channels, hierarchical encoding means for parametrically encoding the number of input audio channels to generate the data stream comprising the number of audio channels and parametric audio data, means for determining a hierarchical decoder structure corresponding to the hierarchical encoding means, means for including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream, and means for transmitting the data stream to the receiver; and the receiver comprises: means for receiving the data stream, means for generating the hierarchical decoder structure in response to the decoder tree structure data, and means for generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- a method of receiving a data stream comprising: receiving a data stream comprising a number of input audio channels and parametric audio data; the data stream further comprising decoder tree structure data for a hierarchical decoder structure, the decoder tree structure data comprising at least on data value indicative of channel split characteristics for an audio channel at a hierarchical layer of the hierarchical decoder structure; generating the hierarchical decoder structure in response to the decoder tree structure data; and generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- a method of transmitting a data stream comprising a number of output audio channels comprising: receiving a number of input audio channels; parametrically encoding the number of input audio channels to generate the data stream comprising the number of output audio channels and parametric audio data; determining a hierarchical decoder structure corresponding to the hierarchical encoding means; including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream; and transmitting the data stream.
- a method of transmitting and receiving a data stream comprising: at a transmitter: receiving a number of input audio channels, parametrically encoding the number of input audio channels to generate the data stream comprising the number of audio channels and parametric audio data, determining a hierarchical decoder structure corresponding to the hierarchical encoding means, including decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream, and transmitting the data stream to the receiver; and at a receiver: receiving the data stream, generating the hierarchical decoder structure in response to the decoder tree structure data, and generating the number of output audio channels from the data stream using the hierarchical decoder structure.
- computer program product for executing any of the methods described above.
- an audio playing device comprising an apparatus as described above.
- an audio recording device comprising an apparatus as described above.
- Fig. 1 illustrates a transmission system for communication of an audio signal in accordance with some embodiments of the invention
- Fig. 2 illustrates an example of a hierarchical encoder structure that may be employed in some embodiments of the invention
- Fig. 3 illustrates an example of an encoder in accordance with some embodiments of the invention
- Fig. 4 illustrates an example of a decoder in accordance with some embodiments of the invention
- Fig. 5 illustrates an example of some hierarchical decoder structures that may be employed in some embodiments of the invention
- Fig. 6 illustrates example hierarchical decoder structures having two-to-three up-mixers at the root
- Fig. 7 illustrates an example hierarchical decoder structure comprising a plurality of decoder tree structures
- Fig. 8 illustrates an example of a one-to-two up-mixer
- Fig. 9 illustrates an example of some hierarchical decoder structures that may be employed in some embodiments of the invention
- Fig. 10 illustrates an example of some hierarchical decoder structures that may be employed in some embodiments of the invention
- Fig. 11 illustrates an exemplary flow chart for a method of decoding in accordance with some embodiments of the invention
- Fig. 12 illustrates an example of a matrix decoder structure in accordance with some embodiments of the invention
- Fig. 13 illustrates an example of a hierarchical decoder structure that may be employed in some embodiments of the invention
- Fig. 14 illustrates an example of a hierarchical decoder structure that may be employed in some embodiments of the invention
- Fig. 15 illustrates a method of transmitting and receiving an audio signal in accordance with some embodiments of the invention.
- Fig. 1 illustrates a transmission system 100 for communication of an audio signal in accordance with some embodiments of the invention.
- the transmission system 100 comprises a transmitter 101 which is coupled to a receiver 103 through a network 105 which specifically may be the Internet.
- the transmitter 101 is a signal recording device and the receiver is a signal player device 103 but it will be appreciated that in other embodiments a transmitter and receiver may used in other applications and for other purposes.
- the transmitter 101 and/or the receiver 103 may be part of a transcoding functionality and may e.g. provide interfacing to other signal sources or destinations.
- the transmitter 101 comprises a digitizer 107 which receives an analog signal that is converted to a digital PCM signal by sampling and analog-to-digital conversion.
- the transmitter 101 is coupled to the encoder 109 of Fig. 1 which encodes the PCM signal in accordance with an encoding algorithm.
- the encoder 100 is coupled to a network transmitter 111 which receives the encoded signal and interfaces to the Internet 105.
- the network transmitter may transmit the encoded signal to the receiver 103 through the Internet 105.
- the receiver 103 comprises a network receiver 113 which interfaces to the Internet 105 and which is arranged to receive the encoded signal from the transmitter 101.
- the network receiver 111 is coupled to a decoder 115.
- the decoder 115 receives the encoded signal and decodes it in accordance with a decoding algorithm.
- the receiver 103 further comprises a signal player 117 which receives the decoded audio signal from the decoder 115 and presents this to the user.
- the signal player 113 may comprise a digital-to-analog converter, amplifiers and speakers as required for outputting the decoded audio signal.
- the encoder 109 and decoder 115 use a cascaded or tree-based structure consisting of small building blocks.
- the encoder 109 thus uses a hierarchical encoding structure wherein the audio channels are progressively processed in different layers of the hierarchical structure. Such a structure may lead to a particularly advantageous encoding with high audio quality yet relatively low complexity and easy implementation of the encoder 109.
- Fig. 2 illustrates an example of a hierarchical encoder structure that may be employed in some embodiments of the invention.
- the encoder 109 encodes a 5.1 channel surround sound input signal consisting of a left front (If), left surround (I 8 ), right front (rf), right surround, center (C 0 ) and a subwoofer or Low Frequency Enhancement (lfe) channel.
- the channels are first segmented and transformed to the frequency domain in the segmentation blocks 201.
- the resulting frequency domain signals are fed pair wise to Two-To-One (TTO) down-mixers 203 which down-mix two input signals into a single output channel and extract the corresponding parameters.
- TTO Two-To-One
- the three TTO down-mixers 203 down-mix the six input channels to three audio channels and parameters.
- the output of the TTO down-mixers 203 are used as input for other TTO down-mixers 205, 207.
- two of the TTO down-mixers 203 are coupled to a fourth TTO down-mixer 205 which combines the corresponding channels into a single channel.
- the third of the TTO down-mixers 203 is together with the fourth TTO down-mixer 205 coupled to a fifth TTO down-mixer 207 which combines the remaining two channels into a single channel (M).
- M single channel
- the TTO down-mixers 203 may be considered to comprise the first layer of the encoding structure, with a second layer comprising the fourth TTO down-mixer 205 and the third layer comprising the fifth TTO down-mixer 207.
- a combination of a number of audio channels into a lower number of audio channels is taking place in each layer of the hierarchical encoder structure.
- the hierarchical encoding structure of the encoder 109 may result in very efficient and high quality encoding for low complexity. Furthermore, the hierarchical encoding structure may be varied depending on the nature of the signal which is encoded. For example, if a simple stereo signal is encoded, this may be achieved by a hierarchical encoding structure comprising only a single TTO down-mixer and a single layer.
- the decoder 115 In order for the decoder 115 to handle signals encoded using different hierarchical encoding structures, it must be able to adapt to the hierarchical encoding structure used for the specific signal. Specifically, the decoder 115 comprises functionality for configuring itself to have a hierarchical decoder structure that matches the hierarchical encoding structure of the encoder 109. However, in order to do so, the decoder 115 must be provided with information of the hierarchical encoding structure used for encoding the received bitstream.
- Fig. 3 illustrates an example of the encoder 109 in accordance with some embodiments of the invention.
- the encoder 109 comprises a receive processor 301 which receives a number of input audio channels. For the specific example of Fig. 2, the encoder 109 receives six input channels.
- the receive processor 301 is coupled to an encode processor 303 which has a hierarchical encoding structure. As an example, the hierarchical encoding structure of the encode processor 303 may correspond to that illustrated in Fig. 2.
- the encode processor 303 is furthermore coupled to an encoding structure processor 305 which is arranged to determine the hierarchical encoding structure used by the encode processor 303.
- the encode processor 303 may specifically feed structure data to the encoding structure processor 305.
- the encoding structure processor 305 generates decoder tree structure data which is indicative of the hierarchical decoder structure that must be used by the decoder to decode the encoded signal generated by the encode processor 303.
- the decoder tree structure data may directly be determined as data describing the hierarchical encoding structure or may e.g. be data which directly describes the hierarchical decoder structure that must be used (e.g. it may describe the complementary structure to that of the encode processor 303).
- the decoder tree structure data specifically comprises at least one data value indicative of a channel split characteristic for an audio channel at hierarchical layers of the hierarchical decoder structure.
- the decoder tree structure data may comprise at least one indication of where an audio channel must be split in the decoder.
- Such an indication may for example be an indication of a layer in which the encoding structure comprises a down-mixer or may equivalently be an indication of a layer of the decoder tree structure that must comprise an up-mixer.
- the encode processor 303 and the encoding structure processor 305 are coupled to a data stream generator 307 which generates a bit stream comprising the encoded audio from the encode processor 303 and the decoder tree structure data from the encoding structure processor 305. This data stream is then fed to the network transmitter 111 for communication to the receiver 103.
- Fig. 4 illustrates an example of the decoder 115 in accordance with some embodiments of the invention.
- the decoder 115 comprises a receiver 401 which receives the data stream transmitted from the network receiver 113.
- the decoder 115 furthermore comprises a decode processor 403 and a decoder structure processor 405 coupled to the receiver 401.
- the receiver 401 extracts the decoder tree structure data and feeds this to the decoder structure processor 405 whereas the audio encoding data comprising a number of audio channels and the parametric audio data is fed to the decode processor 403.
- the decoder structure processor 405 is arranged to determine the hierarchical decoder structure in response to the received decoder tree structure data. Specifically, the decoder structure processor 405 may extract the data values specifying the data splits and may generate information of the hierarchical decoder structure that complements the hierarchical encoding structure of the encode processor 303. This information is fed to the decode processor 403 causing this to be configured for the specified hierarchical decoder structure.
- the decoder structure processor 405 proceeds to generate the output channels corresponding to the original inputs to the encoder 109 using the hierarchical decoder structure.
- the system may allow an efficient and high quality encoding, decoding and distribution of audio signals and specifically of multi-channel audio signals.
- a very flexible system is enabled wherein decoders may automatically adapt to the encoders and the same decoders may thus be used with a number of different encoders.
- the decoder tree structure data is effectively communicated using data values which are indicative of channel split characteristics for the audio channels at the different hierarchical layers of the hierarchical decoder structure.
- the decoder tree structure data is optimized for flexible and high performance hierarchical encoding and decoding structures. For example, a 5.1 channel signal (i.e. a six channel signal) may be encoded as a stereo signal plus a set of spatial parameters.
- Such encoding can be achieved by many different hierarchical encoding structures that use simple TTO or Three-To-Two (TTT) down-mixers and thus many different hierarchical decoder structures are possible using One- To-Two (OTT) or Two-To-Three (TTT) up-mixers.
- TTO Three-To-Two
- OTT One- To-Two
- TTT Two-To-Three
- decoder tree structure data where data values indicate channel splits at the different layers of the hierarchical decoder structure allows a simple general communication of the decoder tree structure data which may describe any hierarchical decoder structure.
- new encoding structures may readily be used without requiring any prior notification of the corresponding decoders.
- the system of Fig. 1 can handle an arbitrary number of input and output channels while maintaining full flexibility. This is achieved by specifying a description of the encoder/decoder tree in the bit-stream. From this description the decoder can derive where and how to apply the subsequent parameters encoded in the bit stream.
- the decoder tree structure data may specifically comprise a plurality of data values where each data value is indicative of a channel split characteristic for one channel at one hierarchical layer of the hierarchical decoder structure.
- the decoder tree structure data may comprise one data value for each up-mixer to be included in the hierarchical decoder structure.
- one data value may be included for each channel which is not to be split further. Thus, if a data value of the decoder tree structure data has a value corresponding to one specific predetermined data value this may indicate that the corresponding channel is not to be split further but is in fact an output channel of the decoder 115.
- the system may only incorporate encoders which exclusively use TTO down-mixers and the decoder may accordingly be implemented using only OTT up-mixers.
- a data value may be included for each channel of the decoder.
- the data value may take on one of two possible values with one value indicating that the channel is not split and the other value indicating that the channel is split into two channels by an OTT up-mixer.
- the order of the data values in the decoder tree structure data may indicate which channels are split and thus the location of the OTT up-mixers in the hierarchical decoder structure.
- a decoder tree structure data comprising simple binary values completely describing the required hierarchical decoder structure may be achieved.
- the derivation of a bitstring description of the hierarchical decoder structure of the decoder of Fig. 5 will be described.
- encoders may only use TTO down-mixers and thus the decoder tree may be described by a binary string.
- a single input audio channel is expanded to a five channel output signal using OTT up-mixers.
- four layers of depth can be discerned, the first, denoted with 0, is at the layer of the input signal, the last, denoted with 3, is at the layer of the output signals.
- the layers are characterized by the audio channels with the up-mixers forming the layer boundaries, the layers may equivalently be considered to comprise or be formed by the up-mixers.
- the hierarchical decoder structure of Fig. 5 may be described by the bit string "111001000" derived by the following steps:
- OTT up-mixer A The input signal at layer 0, t 0 , is split (OTT up-mixer A), as a result all signal at layer 0 are accounted for, move on to layer 1.
- OTT up-mixer B The first signal at layer 1 (coming out of the top of OTT up-mixer A) is split (OTT up-mixer B).
- the second signal at layer 1 (coming out of the bottom of OTT up- mixer A) is split (OTT up-mixer C), all signals at layer 1 are described, move on to layer 2. 0 - The first signal at layer 2 (top of OTT up-mixer B) is not split any further.
- the second signal at layer 2 (bottom of OTT up-mixer B) is not split any further.
- the encoding may be limited to using only TTO and TTT down-mixers and thus the decoding may be limited to using only OTT and TTT up- mixers.
- the TTT up-mixers may be used in many different configurations, it is particularly advantageous to use them in a mode where (waveform) prediction is used to accurately estimate the three output signals from the two input signals. Due to this predictive nature of the TTT up-mixers, the logical position for these up-mixers is at the root of the tree. This is a consequence of the OTT up-mixers destroying the original waveform thereby making prediction unsuitable.
- the only up-mixers that are used in the decoder structure are OTT up-mixers or TTT up-mixers in the root layer.
- Trees that have a TTT up-mixer as root 1.
- Fig. 6 illustrates example hierarchical decoder structures having TTT up- mixers at the root and Fig. 7 illustrates an example hierarchical decoder structure comprising a plurality of decoder tree structures.
- the hierarchical decoder structure of Fig. 7 comprises decoder tree structures according to all three examples presented above.
- the decoder tree structure data is ordered in order of whether an input channel comprises a TTT up-mixer or does not.
- the decoder tree structure data may comprise an indication of a TTT up-mixer being present at the root layer followed by binary data indicative of whether the channels of the lower layers are split by a OTT up- mixer or are not split further. This may improve performance in terms of bit-rate and low signaling costs.
- the decoder tree structure data may indicate how many TTT up- mixers are included in the hierarchical decoder structure.
- each tree structure may only comprise one TTT up-mixer which is located at the root level, the remainder of the tree may be described by a binary string as described previously (i.e. as the tree is a OTT up-mixer tree only for lower layers, the same approach as described for an OTT up-mixer only hierarchical decoder structure can be applied).
- the remaining tree structures are either OTT up-mixer only trees or empty trees which can also be described by binary strings.
- all trees can be described by binary data values and the interpretation of the binary string may depend on which category the tree belongs to.
- This information may be provided by the location of the tree in the decoder tree structure data. For example, all trees comprising a TTT up-mixer may be located first in the decoder tree structure data, followed by the OTT up-mixer only trees, followed by the empty trees. If the number of TTT up-mixers and OTT up-mixers in the hierarchical decoder structure is included in the decoder tree structure data, the decoder can be configured without requiring any further data. Thus, a highly efficient communication of information of the required decoder structure is achieved. The overhead of communicating the decoder tree structure data may be kept very low, yet a highly flexible system is provided which may describe a wide variety of hierarchical decoder structures.
- the hierarchical decoder structures of the decoder of Fig. 7 may be derived from decoder tree structure data by the following process:
- the number of input signals is derived from the (possibly encoded) down-mix.
- the number of OTT up-mixers and TTT up-mixers of the whole tree are signaled in the decoder tree structure data and may be extracted therefrom.
- the input channels may be remapped in the decoder tree structure data such that after remapping first the trees according to situation 1) are encountered, followed by the trees according to situation 2) and then 3). For the example of Fig. 7 this would result in the order 3, 0, 1, 2, 4, i.e., signal 0 is signal 3 after remapping, signal 1 is signal 0 after remapping, etc.
- OTT-only tree descriptions are given using the method described above, one OTT-only tree per TTT output channel. - For all remaining input signals OTT-only descriptions are given.
- an indication of a loudspeaker position for the output channels is included in the decoder tree structure data.
- a look-up table of predetermined loudspeaker locations may be used, such as for example:
- the loudspeaker locations can be represented using a hierarchical approach.
- a few first bits specify the x-axis, e.g. L, R, C
- another few bits specify the y-axis, e.g. Front, Side, Surround
- another few bits specify the z-axis (elevation).
- bit stream syntax for a bit-stream following the described guidelines above.
- the number of input and output signals is explicitly coded in the bit-stream. Such information can be used to validate part of the bit-stream.
- numlnChan bsNumlnChan+l
- numOutChan bsNumOutChan+2
- numTttUp_mixers bsNumTttUp_mixers
- numOttUp mixers bsNumOttUp mixers
- OTT up-mixer is an LFE (Low Frequency Enhancement) OTT up-mixer, i.e., whether the parameters are only band-limited and do not contain any correlation/coherence data.
- LFE Low Frequency Enhancement
- data may specify specific properties of the up-mixers, such as in the example of the TTT up-mixer, which mode to use (waveform based prediction, energy based description, etc.).
- an OTT up-mixer uses a de- correlated signal to split a single channel into two channels.
- the de-correlated signal is derived from the single input channel signal.
- Fig. 8 illustrates an example of an OTT up-mixer according to this approach.
- the exemplary decoder of Fig. 5 may be represented by the diagram of Fig. 9 wherein the de-correlator blocks generating the de- correlated signals are explicitly shown.
- this approach leads to a cascading of de-correlator blocks such that the de-correlated signal for a lower layer OTT up-mixer is generated from an input signal which has been generated from another de-correlated signal.
- the de-correlated signals of the lower layers will have been processed by several de-correlation blocks.
- each de- correlation block comprises a de-correlation filter
- this approach may result in a "smearing" of the de-correlated signal (for example transients may be significantly distorted). This results in audio quality degradation for the output signal.
- the de-correlators applied in the decoder up-mix may therefore in some embodiments be moved such that a cascading of de- correlated signals is prevented.
- Fig. 10 illustrates an example of a decoder structure corresponding to that of Fig. 9 but with the de-correlators directly coupled to the input channel.
- the de-correlator up-mixers instead of taking the output of the predecessor OTT up-mixer as input to the de-correlator, the de-correlator up-mixers directly take the original input signal t 0 , pre- processed by the gain up-mixers G B , G C and G D .
- Fig. 11 illustrates an exemplary flow chart for a method of decoding in accordance with some embodiments of the invention.
- step 1101 the quantized and coded parameters are decoded from the received bit-stream.
- this may result in a number of vectors of conventional parametric audio coding parameters, such as:
- CLDtf [-10 15 10 12 ... 10]
- CLD 1 [S 1 2 15 10 ... 2]
- ICC 0 [I 0.6 0.9 0.3 ... -1]
- ICC 1 [O 1 0.6 0.9 ... 0.3] etc.
- Step 1101 is followed by step 1103 wherein the matrices for the individual up- mixers are determined from the decoded parametric data.
- the (frequency independent) generalized OTT and TTT matrices may respectively be given as:
- the signals X 1 , d t and y t represent input signals, de-correlated signals derived from the signals X 1 and the output signals respectively.
- the matrix entries H y and M y are functions of the parameters derived in step 1103.
- the method then divides into two parallel paths wherein one path is directed to deriving tree-pre matrix values (step 1105) and one path is directed to deriving tree-mix matrix values (step 1107).
- the pre-matrices correspond to the matrix multiplications applied to the input signal before the de-correlation and the matrix application. Specifically, the pre-matrices correspond to the gain up-mixers applied to the input signal prior to the de-correlation filters.
- a straightforward decoder implementation will in general lead to a cascade of de-correlation filters, as e.g. applied in Fig. 9. As explained above, it is preferable to prevent this cascading. In order to do so, the de-correlation filters are all moved to the same hierarchical level as shown in Fig. 10. In order to assure that the de-correlated signals have the appropriate energy level, i.e., identical to the level of the de-correlated signal in the straightforward case of Fig. 9, the pre-matrices are applied prior to the de-correlation.
- the gain G B in Fig. 10 is derived as following.
- a l-to-2 up-mixer divides the input signal power to the upper and lower output of the l-to-2 up-mixer. This property is reflected in the Inter-channel Intensity Difference (HD) or Inter-channel Level Difference (ICLD) parameters.
- the gain G B is calculated as the energy ratio of the upper output divided by the sum of the upper and lower output of l-to-2 up-mixer A. It will be appreciated that since the HD or ICLD parameters can be time- and frequency- variant, the gain may also vary both over time and frequency.
- the mix matrices are the matrices applied to the input signal by the up-mixers in order to generate the additional channels.
- the final pre- and mix-matrix equations are a result of a cascade of the OTT and TTT up-mixers. As the decoder structure has been amended to prevent a cascade of de- correlators this must be taken into account when determining the final equations.
- the relationship between the matrix entries H y and M y and the final matrix equations is constant and a standard modification can be applied.
- Step 1105 is followed by step 1109 wherein the pre-matrices derived in step 1005 are mapped to the actual frequency grid that is applied to transform the time domain signal to the frequency domain (in step 1113).
- Step 1109 is followed by step 1111 wherein interpolation of the frequency matrix parameters may be interpolated. Specifically, depending on whether or not the temporal update of the parameters corresponds to the update of the time-to-frequency transform of step 1113, interpolation may be applied.
- step 1113 the input signals are converted to the frequency domain in order to apply the mapped and optionally interpolated pre-matrices.
- Step 1115 follows step 1111 and step 1113 and comprise applying the pre- matrices to the frequency domain input signals.
- the actual matrix application is a set of matrix multiplications.
- Step 1115 is followed by step 1117 wherein part of the signals resulting from the matrix application of step 1115 is fed to a de-correlation filter to generate de-correlated signals.
- step 1107 is followed by step 1119 wherein the equations determined in step 1107 are mapped to the frequency grid of the time-to-frequency transform of step 1113.
- step 1119 is followed by step 1121 wherein the mix-matrix values are optionally interpolated, again depending on the temporal update of parameters and transform.
- Step 1123 is followed by step 1125 wherein the resulting output is transformed back to the time domain.
- Fig. 12 illustrates an example of a matrix decoder structure in accordance with some embodiments of the invention.
- Fig. 12 illustrates how the input downmix channels can be used to re-construct the multi-channel output. As outlined above, the process can be described by two matrix multiplications with intermediate decorrelation units.
- M"'* is a two dimensional matrix mapping a certain number of input channels to a certain number of channels going into the decorrelators, and is defined for every time- slot n , and every subband k ;
- Mf is a two dimensional matrix mapping a certain number of pre-processed channels to a certain number of output channels, and is defined for every time-slot n , and every hybrid subband k .
- decoder tree structures having only OTT up-mixers will be considered with reference to the exemplary tree of Fig. 13.
- helper variables For this type of trees it is beneficial to define a number of helper variables:
- Tree 1 0 0 1 1
- the OTT up-mixer indices that are encountered for each OTT up-mixer i.e. in the example, the signal being input to the 4 th OTT up-mixer has passed through the 0 th and 1 st OTT up-mixer, as given by the 5 th column in the Tree 1 matrix.
- the signal being input to the 2 nd OTT up-mixer has passed through the 0 th OTT box, as given by the 3 rd column in the Tree 1 matrix, and so on.).
- the matrix corresponds to the Tree 1 matrix, and hence when a certain column and row in the Tree 1 matrix points out a certain OTT up-mixer, the same column and row in the Tree 1 ⁇ matrix indicates if the lower or upper part of that specific OTT up-mixer is used to reach the OTT up-mixer given in the first row of the specific column, (i.e. in the example, the signal being input to the 4th OTT up-mixer has passed through the upper path of the Oth OTT up-mixer (as indicated by the 3 rd row, 5 th column in the TreC matrix), and the lower path of the 1 st OTT up-mixer (as indicated by the 2 nd row, 5 th column in the TreC matrix).
- a temporary matrix K 1 describing the pre-matrix for only the de-correlated signals is then defined according to:
- the HD values are the Inter-channel Intensity Difference values obtained from the bitstream.
- the final pre-mix matrix M 1 is then constructed as: 1
- the pre-mix matrix needs to supply a "dry" input signal for all decorrelators in the OTT up- mixer, where the input signals have the level they would have had at the specific point in the tree where the decorrelator was situated prior to moving it in front of the tree.
- the pre-matrix only applies a pre-gain for signals going into decorrelators, and the mixing of the decorrelator signals and the "dry" downmix signal takes place in the mix-matrix M 2 , which will be elaborated on below, the first element of the pre-mix matrix gives an output that is directly coupled to the M 2 matrix (see Fig. 12, where the m/c line illustrates this).
- the y/"' k vector will be as following:
- e n denotes the decorrelator output from the n ⁇ OTT box in Fig. 13.
- Y x is the upper output of the OTT box
- Y 2 is the lower and X is the dry input signal and Q is the decorrelator signal.
- the first row of the mix matrix M 2 can be observed.
- the first element of the first row in M 2 corresponds to the contribution of the "m" signal, and is the contribution to the output given by the upper outputs of OTT up-mixer 0, 1 and 3. Given the H matrix above, this corresponds to Hl I 0 , Hl 1 ⁇ and Hl I 3 , since the amount of dry signal for the upper output of an OTT box is given by the Hl 1 element of the OTT up-mixer.
- the second element corresponds to the contribution of de-correlator Dl, which according to the above is situated in OTT up-mixer 0. Hence, the contribution of this is Hl I 0 , Hl I 3 and Hl 2 0 .
- the Hl 2 0 element gives the decorrelator output from OTT up-mixer 0, and that signal is subsequently passed through OTT up-mixer 1 and 3, as part of the dry signal, and thus gain adjusted according to the Hl I 0 and Hl I 3 elements.
- the third element corresponds to the contribution of the de- correlator D2, which according to the above is situated in OTT up-mixer 1. Hence, the contribution of this is H12 0 and Hl I 3 .
- the fifth element corresponds to the contribution of the de-correlator D3, which according to the above notation is situated in OTT up-mixer 3. Hence, the contribution of this is Hl 2 3 .
- the fourth and sixth element of the first row is zero since no contribution of de-correlator D4 or D6 is part of the output channel corresponding to the first row in the matrix.
- walk-trough example makes it evident that the matrix elements can be deducted as products of OTT up-mixer matrix elements H.
- the matrix Tree holds a column for every out channel, describing the indexes of the OTT up-mixers the signal must pass to reach each output channel.
- the matrix Tree S i gn holds an indicator for every up-mixer in the tree to indicate if the upper (1) or lower (-1) path should be used to reach the current output channel.
- Tree mm , 1 -1 1 -1
- the Treedepth vector holds the number of up-mixers that must be passed to get to a specific output channel.
- Tree _ [3 3 3 3 2 2]
- the Tree e iements vector holds the number of up-mixers in every sub tree of the whole tree
- the M 2 matrix can be defined.
- the matrix for a sub-tree k, creating N output channels from 1 input channel is defined according to:
- Tree depth (j) > 0
- ⁇ elements are defined by the parameters corresponding to the OTT up-mixer with index Tree(p j).
- TTT up-mixers at the root level is assumed, such as for example the decoder structure of Fig. 14.
- the up-mixers containing two variables MIi and M2i denote OTT trees and thus not necessarily single OTT up-mixers.
- the TTT up-mixers do not employ a de-correlated signal, i.e., the TTT matrix can be described as a 3x2 matrix: iV11 ⁇
- two sets of pre-mix matrices are derived for each OTT tree, one describing the pre-matrixing for the first input signal of the TTT up-mixer and one describing the pre-matrixing for the second input signal of the TTT up-mixer. After application of both pre-matrixing blocks and de-correlation the signals can be summed.
- the output signals may thus be derived as the following:
- the contribution of the de-correlated signal can be added in the form of a post-process. After the TTT up-mixer de-correlated signal has been derived, the contribution to each output signal is simply the contribution given by the [M 13 , M 23 , M 33 ] vector spread by the IIDs of each following OTT up-mixer.
- Fig. 15 illustrates a method of transmitting and receiving an audio signal in accordance with some embodiments of the invention.
- step 1501 a transmitter receives a number of input audio channels.
- step 1501 is followed by step 1503 wherein the transmitter parametrically encodes the number of input audio channels to generate the data stream comprising the number of audio channels and parametric audio data.
- Step 1503 is followed by step 1505 wherein the hierarchical decoder structure corresponding to the hierarchical encoding means is determined.
- Step 1505 is followed by step 1507 wherein the transmitter includes decoder tree structure data comprising at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure in the data stream.
- Step 1507 is followed by step 1509 wherein the transmitter transmits the data stream to the receiver.
- Step 1509 is followed by step 1511 wherein a receiver receives the data stream.
- Step 1511 is followed by step 1513 wherein the hierarchical decoder structure to be used by the receiver is determined in response to the decoder tree structure data.
- step 1513 is followed by step 1515 wherein the receiver generates the number of output audio channels from the data stream using the hierarchical decoder structure.
- an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Diaphragms For Electromechanical Transducers (AREA)
- Amplifiers (AREA)
Abstract
Description
Claims
Priority Applications (12)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/995,538 US7966191B2 (en) | 2005-07-14 | 2006-07-07 | Method and apparatus for generating a number of output audio channels |
EP06766049A EP1902443B1 (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding |
CN2006800255555A CN101223575B (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding |
AT06766049T ATE433182T1 (en) | 2005-07-14 | 2006-07-07 | AUDIO CODING AND AUDIO DECODING |
KR1020107024547A KR101492826B1 (en) | 2005-07-14 | 2006-07-07 | Apparatus and method for generating a number of output audio channels, receiver and audio playing device comprising the apparatus, data stream receiving method, and computer-readable recording medium |
MX2008000504A MX2008000504A (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding. |
JP2008521009A JP5097702B2 (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding |
DE602006007139T DE602006007139D1 (en) | 2005-07-14 | 2006-07-07 | AUDIO CODING AND AUDIO CODING |
BRPI0613469-6A BRPI0613469A2 (en) | 2005-07-14 | 2006-07-07 | apparatus and methods for generating a number of audio output channels and a data stream, data stream, storage medium, receiver for generating a number of audio output channels, transmitter for generating a data stream, transmission system , methods of receiving and transmitting a data stream, computer program product, and audio playback and audio recording devices |
KR1020087003599A KR101496193B1 (en) | 2005-07-14 | 2006-07-07 | An apparatus and a method for generating output audio channels and a data stream comprising the output audio channels, a method and an apparatus of transmitting and receiving a data stream, and audio playing and recording devices |
PL06766049T PL1902443T3 (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding |
US12/882,862 US8626503B2 (en) | 2005-07-14 | 2010-09-15 | Audio encoding and decoding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05106466.5 | 2005-07-14 | ||
EP05106466 | 2005-07-14 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/995,538 A-371-Of-International US7966191B2 (en) | 2005-07-14 | 2006-07-07 | Method and apparatus for generating a number of output audio channels |
US12/882,862 Continuation-In-Part US8626503B2 (en) | 2005-07-14 | 2010-09-15 | Audio encoding and decoding |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007007263A2 true WO2007007263A2 (en) | 2007-01-18 |
WO2007007263A3 WO2007007263A3 (en) | 2007-03-29 |
Family
ID=37467582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2006/052309 WO2007007263A2 (en) | 2005-07-14 | 2006-07-07 | Audio encoding and decoding |
Country Status (14)
Country | Link |
---|---|
US (1) | US7966191B2 (en) |
EP (2) | EP2088580B1 (en) |
JP (2) | JP5097702B2 (en) |
KR (2) | KR101492826B1 (en) |
CN (2) | CN101223575B (en) |
AT (2) | ATE433182T1 (en) |
BR (1) | BRPI0613469A2 (en) |
DE (1) | DE602006007139D1 (en) |
ES (2) | ES2374309T3 (en) |
HK (1) | HK1154984A1 (en) |
MX (1) | MX2008000504A (en) |
PL (2) | PL2088580T3 (en) |
RU (2) | RU2418385C2 (en) |
WO (1) | WO2007007263A2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007114624A1 (en) | 2006-04-03 | 2007-10-11 | Lg Electronics, Inc. | Apparatus for processing media signal and method thereof |
WO2008114984A1 (en) | 2007-03-16 | 2008-09-25 | Lg Electronics Inc. | A method and an apparatus for processing an audio signal |
US7693183B2 (en) * | 2005-07-29 | 2010-04-06 | Lg Electronics Inc. | Method for signaling of splitting information |
US20100241434A1 (en) * | 2007-02-20 | 2010-09-23 | Kojiro Ono | Multi-channel decoding device, multi-channel decoding method, program, and semiconductor integrated circuit |
JP2011501230A (en) * | 2007-10-22 | 2011-01-06 | 韓國電子通信研究院 | Multi-object audio encoding and decoding method and apparatus |
WO2011073201A3 (en) * | 2009-12-16 | 2011-10-06 | Dolby International Ab | Sbr bitstream parameter downmix |
WO2015036351A1 (en) * | 2013-09-12 | 2015-03-19 | Dolby International Ab | Methods and devices for joint multichannel coding |
WO2015059153A1 (en) * | 2013-10-21 | 2015-04-30 | Dolby International Ab | Parametric reconstruction of audio signals |
RU2708441C2 (en) * | 2015-06-24 | 2019-12-06 | Сони Корпорейшн | Audio processing device, method and program |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101492826B1 (en) * | 2005-07-14 | 2015-02-13 | 코닌클리케 필립스 엔.브이. | Apparatus and method for generating a number of output audio channels, receiver and audio playing device comprising the apparatus, data stream receiving method, and computer-readable recording medium |
KR101218776B1 (en) | 2006-01-11 | 2013-01-18 | 삼성전자주식회사 | Method of generating multi-channel signal from down-mixed signal and computer-readable medium |
EP2109861B1 (en) * | 2007-01-10 | 2019-03-13 | Koninklijke Philips N.V. | Audio decoder |
KR101464977B1 (en) * | 2007-10-01 | 2014-11-25 | 삼성전자주식회사 | Method of managing a memory and Method and apparatus of decoding multi channel data |
ES2963744T3 (en) * | 2008-10-29 | 2024-04-01 | Dolby Int Ab | Signal clipping protection using pre-existing audio gain metadata |
KR20110022251A (en) * | 2009-08-27 | 2011-03-07 | 삼성전자주식회사 | Method and apparatus for encoding/decoding stereo audio |
PL2491553T3 (en) | 2009-10-20 | 2017-05-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using an iterative interval size reduction |
CN102142924B (en) * | 2010-02-03 | 2014-04-09 | 中兴通讯股份有限公司 | Versatile audio code (VAC) transmission method and device |
CA2826018C (en) | 2011-03-28 | 2016-05-17 | Dolby Laboratories Licensing Corporation | Reduced complexity transform for a low-frequency-effects channel |
TWI618050B (en) | 2013-02-14 | 2018-03-11 | 杜比實驗室特許公司 | Method and apparatus for signal decorrelation in an audio processing system |
WO2014126688A1 (en) | 2013-02-14 | 2014-08-21 | Dolby Laboratories Licensing Corporation | Methods for audio signal transient detection and decorrelation control |
WO2014126689A1 (en) * | 2013-02-14 | 2014-08-21 | Dolby Laboratories Licensing Corporation | Methods for controlling the inter-channel coherence of upmixed audio signals |
JP6248186B2 (en) | 2013-05-24 | 2017-12-13 | ドルビー・インターナショナル・アーベー | Audio encoding and decoding method, corresponding computer readable medium and corresponding audio encoder and decoder |
KR101805327B1 (en) | 2013-10-21 | 2017-12-05 | 돌비 인터네셔널 에이비 | Decorrelator structure for parametric reconstruction of audio signals |
EP2866475A1 (en) | 2013-10-23 | 2015-04-29 | Thomson Licensing | Method for and apparatus for decoding an audio soundfield representation for audio playback using 2D setups |
WO2016066743A1 (en) * | 2014-10-31 | 2016-05-06 | Dolby International Ab | Parametric encoding and decoding of multichannel audio signals |
US10416954B2 (en) | 2017-04-28 | 2019-09-17 | Microsoft Technology Licensing, Llc | Streaming of augmented/virtual reality spatial audio/video |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004008805A1 (en) * | 2002-07-12 | 2004-01-22 | Koninklijke Philips Electronics N.V. | Audio coding |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3298478B2 (en) * | 1997-11-10 | 2002-07-02 | 日本電気株式会社 | MPEG decoding device |
JPH11330980A (en) * | 1998-05-13 | 1999-11-30 | Matsushita Electric Ind Co Ltd | Decoding device and method and recording medium recording decoding procedure |
JP2001268697A (en) * | 2000-03-22 | 2001-09-28 | Sony Corp | System, device, and method for data transmission |
WO2004019656A2 (en) | 2001-02-07 | 2004-03-04 | Dolby Laboratories Licensing Corporation | Audio channel spatial translation |
US6996523B1 (en) * | 2001-02-13 | 2006-02-07 | Hughes Electronics Corporation | Prototype waveform magnitude quantization for a frequency domain interpolative speech codec system |
SE0202159D0 (en) * | 2001-07-10 | 2002-07-09 | Coding Technologies Sweden Ab | Efficientand scalable parametric stereo coding for low bitrate applications |
DE60326782D1 (en) * | 2002-04-22 | 2009-04-30 | Koninkl Philips Electronics Nv | Decoding device with decorrelation unit |
US7502743B2 (en) * | 2002-09-04 | 2009-03-10 | Microsoft Corporation | Multi-channel audio encoding and decoding with multi-channel transform selection |
JP4676140B2 (en) * | 2002-09-04 | 2011-04-27 | マイクロソフト コーポレーション | Audio quantization and inverse quantization |
FR2852172A1 (en) * | 2003-03-04 | 2004-09-10 | France Telecom | Audio signal coding method, involves coding one part of audio signal frequency spectrum with core coder and another part with extension coder, where part of spectrum is coded with both core coder and extension coder |
US7447317B2 (en) * | 2003-10-02 | 2008-11-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V | Compatible multi-channel coding/decoding by weighting the downmix channel |
KR100571824B1 (en) * | 2003-11-26 | 2006-04-17 | 삼성전자주식회사 | Method for encoding/decoding of embedding the ancillary data in MPEG-4 BSAC audio bitstream and apparatus using thereof |
JPWO2005081229A1 (en) * | 2004-02-25 | 2007-10-25 | 松下電器産業株式会社 | Audio encoder and audio decoder |
ES2426917T3 (en) * | 2004-04-05 | 2013-10-25 | Koninklijke Philips N.V. | Encoder, decoder, methods and associated audio system |
EP1736965B1 (en) * | 2004-04-28 | 2008-07-30 | Matsushita Electric Industrial Co., Ltd. | Hierarchy encoding apparatus and hierarchy encoding method |
KR101492826B1 (en) * | 2005-07-14 | 2015-02-13 | 코닌클리케 필립스 엔.브이. | Apparatus and method for generating a number of output audio channels, receiver and audio playing device comprising the apparatus, data stream receiving method, and computer-readable recording medium |
JP5321820B2 (en) * | 2009-04-23 | 2013-10-23 | セイコーエプソン株式会社 | Paper transport device |
-
2006
- 2006-07-07 KR KR1020107024547A patent/KR101492826B1/en active IP Right Grant
- 2006-07-07 MX MX2008000504A patent/MX2008000504A/en active IP Right Grant
- 2006-07-07 US US11/995,538 patent/US7966191B2/en active Active
- 2006-07-07 PL PL09005485T patent/PL2088580T3/en unknown
- 2006-07-07 KR KR1020087003599A patent/KR101496193B1/en active IP Right Grant
- 2006-07-07 EP EP09005485A patent/EP2088580B1/en active Active
- 2006-07-07 RU RU2008105556/09A patent/RU2418385C2/en active
- 2006-07-07 PL PL06766049T patent/PL1902443T3/en unknown
- 2006-07-07 JP JP2008521009A patent/JP5097702B2/en active Active
- 2006-07-07 CN CN2006800255555A patent/CN101223575B/en active Active
- 2006-07-07 BR BRPI0613469-6A patent/BRPI0613469A2/en not_active Application Discontinuation
- 2006-07-07 AT AT06766049T patent/ATE433182T1/en not_active IP Right Cessation
- 2006-07-07 WO PCT/IB2006/052309 patent/WO2007007263A2/en active Application Filing
- 2006-07-07 ES ES09005485T patent/ES2374309T3/en active Active
- 2006-07-07 CN CN2010102983019A patent/CN102013256B/en active Active
- 2006-07-07 DE DE602006007139T patent/DE602006007139D1/en active Active
- 2006-07-07 ES ES06766049T patent/ES2327158T3/en active Active
- 2006-07-07 EP EP06766049A patent/EP1902443B1/en active Active
- 2006-07-07 AT AT09005485T patent/ATE523877T1/en not_active IP Right Cessation
-
2010
- 2010-09-08 RU RU2010137467/08A patent/RU2461078C2/en active
- 2010-11-15 JP JP2010254409A patent/JP5269039B2/en active Active
-
2011
- 2011-08-30 HK HK11109165.7A patent/HK1154984A1/en unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004008805A1 (en) * | 2002-07-12 | 2004-01-22 | Koninklijke Philips Electronics N.V. | Audio coding |
Non-Patent Citations (1)
Title |
---|
HERRE J ET AL: "THE REFERENCE MODEL ARCHITECTURE FOR MPEG SPATIAL AUDIO CODING" AUDIO ENGINEERING SOCIETY CONVENTION PAPER, NEW YORK, NY, US, 28 May 2005 (2005-05-28), pages 1-13, XP009059973 * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7761177B2 (en) | 2005-07-29 | 2010-07-20 | Lg Electronics Inc. | Method for generating encoded audio signal and method for processing audio signal |
US7693183B2 (en) * | 2005-07-29 | 2010-04-06 | Lg Electronics Inc. | Method for signaling of splitting information |
US7706905B2 (en) | 2005-07-29 | 2010-04-27 | Lg Electronics Inc. | Method for processing audio signal |
US7702407B2 (en) | 2005-07-29 | 2010-04-20 | Lg Electronics Inc. | Method for generating encoded audio signal and method for processing audio signal |
US7693706B2 (en) | 2005-07-29 | 2010-04-06 | Lg Electronics Inc. | Method for generating encoded audio signal and method for processing audio signal |
EP2002425A1 (en) * | 2006-04-03 | 2008-12-17 | LG Electronics, Inc. | Apparatus for processing media signal and method thereof |
WO2007114624A1 (en) | 2006-04-03 | 2007-10-11 | Lg Electronics, Inc. | Apparatus for processing media signal and method thereof |
EP2002425A4 (en) * | 2006-04-03 | 2014-06-25 | Lg Electronics Inc | Apparatus for processing media signal and method thereof |
US20100241434A1 (en) * | 2007-02-20 | 2010-09-23 | Kojiro Ono | Multi-channel decoding device, multi-channel decoding method, program, and semiconductor integrated circuit |
EP2130304A1 (en) * | 2007-03-16 | 2009-12-09 | LG Electronics Inc. | A method and an apparatus for processing an audio signal |
WO2008114984A1 (en) | 2007-03-16 | 2008-09-25 | Lg Electronics Inc. | A method and an apparatus for processing an audio signal |
US8712060B2 (en) | 2007-03-16 | 2014-04-29 | Lg Electronics Inc. | Method and an apparatus for processing an audio signal |
US9373333B2 (en) | 2007-03-16 | 2016-06-21 | Lg Electronics Inc. | Method and apparatus for processing an audio signal |
EP2130304A4 (en) * | 2007-03-16 | 2012-04-04 | Lg Electronics Inc | A method and an apparatus for processing an audio signal |
US8725279B2 (en) | 2007-03-16 | 2014-05-13 | Lg Electronics Inc. | Method and an apparatus for processing an audio signal |
JP2011501230A (en) * | 2007-10-22 | 2011-01-06 | 韓國電子通信研究院 | Multi-object audio encoding and decoding method and apparatus |
JP2012212160A (en) * | 2007-10-22 | 2012-11-01 | Korea Electronics Telecommun | Multi-object audio encoding and decoding method and apparatus thereof |
RU2526745C2 (en) * | 2009-12-16 | 2014-08-27 | Долби Интернешнл Аб | Sbr bitstream parameter downmix |
KR101370870B1 (en) * | 2009-12-16 | 2014-03-07 | 돌비 인터네셔널 에이비 | Sbr bitstream parameter downmix |
JP2013210674A (en) * | 2009-12-16 | 2013-10-10 | Dolby International Ab | Sbr bit stream parameter down-mix |
AU2010332925B2 (en) * | 2009-12-16 | 2013-07-11 | Dolby International Ab | SBR bitstream parameter downmix |
US9508351B2 (en) | 2009-12-16 | 2016-11-29 | Dobly International AB | SBR bitstream parameter downmix |
WO2011073201A3 (en) * | 2009-12-16 | 2011-10-06 | Dolby International Ab | Sbr bitstream parameter downmix |
CN105531760A (en) * | 2013-09-12 | 2016-04-27 | 杜比国际公司 | Methods and devices for joint multichannel coding |
EP4339944A3 (en) * | 2013-09-12 | 2024-05-29 | Dolby International AB | Methods and devices for joint multichannel coding |
JP2016535316A (en) * | 2013-09-12 | 2016-11-10 | ドルビー・インターナショナル・アーベー | Method and apparatus for joint multi-channel coding |
WO2015036351A1 (en) * | 2013-09-12 | 2015-03-19 | Dolby International Ab | Methods and devices for joint multichannel coding |
AU2014320540B2 (en) * | 2013-09-12 | 2017-09-28 | Dolby International Ab | Methods and devices for joint multichannel coding |
EP3989221A1 (en) * | 2013-09-12 | 2022-04-27 | Dolby International AB | Methods and devices for joint multichannel coding |
RU2653285C2 (en) * | 2013-09-12 | 2018-05-07 | Долби Интернэшнл Аб | Methods and devices for joint multichannel coding |
TWI671734B (en) * | 2013-09-12 | 2019-09-11 | 瑞典商杜比國際公司 | Decoding method, encoding method, decoding device, and encoding device in multichannel audio system comprising three audio channels, computer program product comprising a non-transitory computer-readable medium with instructions for performing decoding m |
EP3330963A1 (en) * | 2013-09-12 | 2018-06-06 | Dolby International AB | Methods and devices for joint multichannel coding |
TWI634547B (en) * | 2013-09-12 | 2018-09-01 | 瑞典商杜比國際公司 | Decoding method, decoding device, encoding method, and encoding device in multichannel audio system comprising at least four audio channels, and computer program product comprising computer-readable medium |
TWI713018B (en) * | 2013-09-12 | 2020-12-11 | 瑞典商杜比國際公司 | Decoding method, and decoding device in multichannel audio system, computer program product comprising a non-transitory computer-readable medium with instructions for performing decoding method, audio system comprising decoding device |
US9978385B2 (en) | 2013-10-21 | 2018-05-22 | Dolby International Ab | Parametric reconstruction of audio signals |
US10614825B2 (en) | 2013-10-21 | 2020-04-07 | Dolby International Ab | Parametric reconstruction of audio signals |
US10242685B2 (en) | 2013-10-21 | 2019-03-26 | Dolby International Ab | Parametric reconstruction of audio signals |
RU2648947C2 (en) * | 2013-10-21 | 2018-03-28 | Долби Интернэшнл Аб | Parametric reconstruction of audio signals |
US11450330B2 (en) | 2013-10-21 | 2022-09-20 | Dolby International Ab | Parametric reconstruction of audio signals |
US11769516B2 (en) | 2013-10-21 | 2023-09-26 | Dolby International Ab | Parametric reconstruction of audio signals |
WO2015059153A1 (en) * | 2013-10-21 | 2015-04-30 | Dolby International Ab | Parametric reconstruction of audio signals |
RU2708441C2 (en) * | 2015-06-24 | 2019-12-06 | Сони Корпорейшн | Audio processing device, method and program |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1902443B1 (en) | Audio encoding and decoding | |
US20200335115A1 (en) | Audio encoding and decoding | |
US7961890B2 (en) | Multi-channel hierarchical audio coding with compact side information | |
JP6117997B2 (en) | Audio decoder, audio encoder, method for providing at least four audio channel signals based on a coded representation, method for providing a coded representation based on at least four audio channel signals with bandwidth extension, and Computer program | |
EP2751803B1 (en) | Audio object encoding and decoding | |
JP4772279B2 (en) | Multi-channel / cue encoding / decoding of audio signals | |
KR101303441B1 (en) | Audio coding using downmix | |
JP4601669B2 (en) | Apparatus and method for generating a multi-channel signal or parameter data set | |
WO2008100100A1 (en) | Methods and apparatuses for encoding and decoding object-based audio signals | |
US8626503B2 (en) | Audio encoding and decoding | |
Breebaart et al. | Spatial audio object coding (SAOC)-the upcoming MPEG standard on parametric object based audio coding | |
JP2015528926A (en) | Generalized spatial audio object coding parametric concept decoder and method for downmix / upmix multichannel applications | |
He et al. | A study on the frequency-domain primary-ambient extraction for stereo audio signals | |
KR20160003572A (en) | Method and apparatus for processing multi-channel audio signal | |
KR20160101692A (en) | Method for processing multichannel signal and apparatus for performing the method | |
MX2008010631A (en) | Audio encoding and decoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006766049 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008521009 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/a/2008/000504 Country of ref document: MX |
|
WWE | Wipo information: entry into national phase |
Ref document number: 211/CHENP/2008 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11995538 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200680025555.5 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008105556 Country of ref document: RU Ref document number: 1020087003599 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2006766049 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020107024547 Country of ref document: KR |
|
ENP | Entry into the national phase |
Ref document number: PI0613469 Country of ref document: BR Kind code of ref document: A2 Effective date: 20080111 |