WO2007078254A2 - Personalized decoding of multi-channel surround sound - Google Patents

Personalized decoding of multi-channel surround sound Download PDF

Info

Publication number
WO2007078254A2
WO2007078254A2 PCT/SE2007/000006 SE2007000006W WO2007078254A2 WO 2007078254 A2 WO2007078254 A2 WO 2007078254A2 SE 2007000006 W SE2007000006 W SE 2007000006W WO 2007078254 A2 WO2007078254 A2 WO 2007078254A2
Authority
WO
WIPO (PCT)
Prior art keywords
parameters
spatial
spatial parameters
modifying
bitstream
Prior art date
Application number
PCT/SE2007/000006
Other languages
French (fr)
Other versions
WO2007078254A3 (en
Inventor
Anisse Taleb
Erlendur Karlsson
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to EP07701092A priority Critical patent/EP1969901A2/en
Priority to BRPI0706285-0A priority patent/BRPI0706285A2/en
Publication of WO2007078254A2 publication Critical patent/WO2007078254A2/en
Publication of WO2007078254A3 publication Critical patent/WO2007078254A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels

Definitions

  • the present invention is related to decoding a multi-channel surround audio bitstream.
  • the next field where this audio technology will be used includes mobile wireless units or terminals, in particular small units such as cellular telephones and PDAs.
  • mobile wireless units or terminals in particular small units such as cellular telephones and PDAs.
  • the immersive nature of the surround sound is even more important because of the small sizes of the displays.
  • the available bit-rate is in many cases low in wireless mobile channels. 2.
  • the processing power of mobile terminals is often limited.
  • Small mobile terminals generally have only two micro speakers and earplugs or headphones.
  • a surround sound solution for a mobile terminal has to use a much lower bit rate than the 384 kbits/s used in the Dolby Digital 5.1 system. Due to the limited processing power, the decoders of the mobile terminals must be computationally optimized and due to the speaker configuration of the mobile terminal, the surround sound must be delivered through the earplugs or headphones.
  • a standard way of delivering multi-channel surround sound through headphones or earplugs is to perform a 3D audio or binaural rendering of each of the speaker signals.
  • each incoming monophonic signal is filtered through a set of filters that model the transformations created by the human head, torso and ears.
  • These filters are called head related filters (HRFs) having head related transfer functions (HRTFs) and if appropriately designed, they give a good 3D audio scene perception.
  • HRFs head related filters
  • HRTFs head related transfer functions
  • the diagram of Fig. 1 illustrates a method of complete 3D audio rendering of an audio signal according to the Dolby Digital 5.1 system.
  • the six multi-channel signals according to the Dolby Digital 5.1 system are: - surround right (SR), - right (R),
  • the center and low frequency signals are combined into one signal.
  • five different filters Hf , Hf ,H c , Hf and H c r are needed in order to implement this method of head related filtering.
  • the SR signal is input to filters Hf and Hf.
  • the o R signal is input to filters Hf and Hf.
  • the C and LFE signals are jointly input to filter H c
  • the L signal is input to filters Hf. and Hf
  • the SL signal is input to filters Hf. and Hf .
  • the quality in terms of 3D perception of such rendering depends on how closely the HRFs model or represent the listener's own head related filtering when she/he is listening. Hence, it may be advantageous if the HRFs can be adapted and personalized for each listener if a good or very good quality is desired.
  • This adaptation and personalization step may include modeling, measure- 0 ment and in general a user dependent tuning in order to refine the quality of the perceived 3D audio scene.
  • the encoder 3 then forms in down-mixing unit 5 a composite down-mixed signal comprising the individual down-mixed signals Z 1 (H) to z M ( ⁇ ) .
  • the number M of down-mixed channels (M ⁇ N) is dependent upon the required or allowable maximum bit-rate, the required quality and the availability of an M-channel audio encoder 7.
  • One key aspect of the encoding process is that the down-mixed composite signal, typically a stereo signal but it could also be a mono signal, is derived from the multichannel input signal, and it is this down-mixed composite signal that is compressed in the audio encoder 7 for transmission over the wireless channel 9 rather than the original multi-channel signal.
  • the parametric encoder 3 and in particular down-mixing unit 5 thereof may be capable of performing a down-mixing process, such that it creates a more or less true equivalent of the multi-channel signal, in the mono or stereo down-mixing.
  • the parametric surround encoder also comprises a spatial parameter estimation unit 9 that from the input signals X 1 (Ji) to x N (n) computes the cues or spatial parameters that in some way can be said to describe the down- mixing process or the assumptions made therein.
  • the compressed audio signal which is output from the M-channel audio encoder and also is the main signal is together with the spatial parameters that constitute side information transmitted over an interface 11 such as a wireless interface to the receiving side that in the case considered here typically is a mobile terminal.
  • the down-mixing could be supplied by some external unit, such as from a unit employing Artistic Downmix.
  • a complementary parametric surround decoder 13 includes an audio decoder 15 and should be constructed to be capable of creating the best possible multi-channel decoding based on knowledge of the down-mixing algorithm used on the transmitting side and the encoded spatial parameters or cues that are received in parallel to the compressed multichannel signal.
  • the audio decoder 15 produces signals Z 1 (H) to z M ( ⁇ ) that should be as similar as possible to the signals Z 1 (Ii) to z M (n) on the transmitting side. These are together with the spatial parameters input to a spatial synthesis unit 17 that produces output signals X 1 Qi) to x N ( ⁇ ) that should be as similar as possible to the original input signals X 1 (U) to x N ( ⁇ ) on the transmitting side.
  • the output signals X 1 ( ⁇ ) to x N ( ⁇ ) can be input to a binaural rendering system such as that shown in Fig. 1.
  • the encoding process can use any of a number of high-performance compression algorithms such as AMR-WB+, MPEG-I Layer III, MPEG-4 AAC or MPEG-4 High Efficiency AAC, and it could even use PCM.
  • the above operations are done in the transformed signal domain, such as Fourier transform or MDCT. This is especially beneficial if the spatial parameter estimation and synthesis in the units 9 and 17 use the same type of transform as that used in the audio encoder 7, also called core codec.
  • Fig. 3 is a detailed block diagram of an efficient parametric audio encoder.
  • the N-channel discrete time input signal denoted in vector form as x N (n) , is first transformed to the frequency domain in a transform unit 21, and in general to a transform domain that gives a signal x ⁇ (k, m) .
  • the index k is the index of the transform coefficients, or sub-bands if a frequency domain transform is chosen.
  • the index m represents the decimated time domain index that is also related to the input signal possibly through overlapped frames.
  • the signal is thereafter down-mixed in a down-mixing unit 5 to generate the M-channel downmix signal z M (k,m) , where M ⁇ N.
  • a sequence of spatial model parameter vectors ⁇ p N (k, m) is estimated in an estimation unit 9. This can be either done in an open-loop or closed loop fashion.
  • Spatial parameters consist of psycho-acoustical cues that are representative of the surround sound sensation. For instance, in the MPEG surround encoder, these parameters consist of inter- channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal relative to a transmitted down-mixed signal z M (k,m) (or if in closed loop, the decoded signal x M (k,m)).
  • the cues p N (k,m) can be encoded in a very compact form such as in a spatial parameter quantization unit 23 producing the signal p N (k,7n) followed by a spatial parameter encoder 25.
  • the M-channel audio encoder 7 produces the main bitstream which in a multiplexer 27 is multiplexed with the spatial side information produced by the parameter encoder. From the multiplexer the multiplexed signal is transmitted to demultiplexer 29 on the receiving side in which the side information and the main bitstream are recovered as seen in the block diagram of Fig. 4. On the receiving side the main bitstream is decoded to synthesize a high quality multichannel representation using the received spatial parameters. The main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals x M (k,m) are input to the spatial synthesis unit 17.
  • the spatial side information holding the spatial parameters is extracted by the demultiplexer 29 and provided to a spatial parameter decoder 33 that produces the decoded parameters p N (k,m) and transmits them to the synthesis unit 17.
  • the spatial synthesis unit produces the signal x N (k,m) , that is provided to the signal F/T transform unit 35 transforming into the time domain to produce the signal ⁇ N (n) , i.e. the multichannel decoded signal.
  • a 3D audio rendering of a multi-channel surround sound can be delivered to a mobile terminal user by using an efficient parametric surround decoder to first obtain the multiple surround sound channels, using for instance the multi-channel decoder described above with reference to Fig. 4. Thereupon, the system illustrated in Fig.
  • 3D audio rendering is multiple and include gamming, mobile TV shows, using standards such as 3GPP MBMS or DVB-H, listening to music concerts, watching movies and in general multimedia services, which contain a multi-channel audio component.
  • the second disadvantage consists of the temporary memory that is needed in order to store the intermediate decoded channels. They are in fact buffered since they are needed in the second stage of 3D rendering.
  • post-processing steps that usually are part of speech and audio codecs may affect the quality of such 3D audio rendering.
  • These post-processing are beneficial for listening in a loudspeaker environment. However, they may introduce severe nonlinear phase distortion that is unequally distributed over the multiple channels and that may impact the 3D audio rendering quality.
  • the spatial parameters received by a parametric multi-channel decoder may be transformed into a new set of spatial parameters that are used in order to obtain a different decoding of multi-channel surround sound.
  • the transformed parameters may also be personalized spatial parameters and can then be obtained by combining both the received spatial parameters and a representation of user head related filters.
  • the personalized spatial parameters may also be obtained by combining the received spatial parameters and a representation of the user head related filters and a set of additional rendering parameters determined by the user.
  • a subset of the set of additional rendering parameters may be interactive parameters that are set in response to user choices that may be changed during the listening process.
  • the set of additional rendering parameters may be time dependent parameters.
  • the method as described herein may allow a simple and efficient way to render surround sound, which is encoded by parametric encoders on mobile devices.
  • the major advantage consists of a reduced complexity and an increased interactivity when listening through headphones using a mobile device. Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention maybe realized and obtained by means of the methods, processes, instrumentalities and combinations particularly pointed out in the appended claims.
  • Fig. 1 is a block diagram illustrating a possible 3D audio or binaural rendering of a 5.1 audio signal
  • - Fig. 2 is a high-level description of the principles of a parametric multi-channel coding and decoding system
  • - Fig. 4 is a detailed description of the parametric multi-channel audio decoder
  • - Fig. 5 is 3D-audio rendering of decoded multi-channel signal (Prior- Art)
  • - Fig. 6 is a personalized binaural decoding of multi-channel surround sound
  • - Fig. 7 is a generalized diagram of the spatial audio processing in the MPEG-surround decoder
  • Fig. 8 is an embodiment of the invention for personalized binaural decoding
  • Fig. 9 is a schematic illustrating combining parameters
  • Fig. 10 is a diagram illustrating results of listening test.
  • the block diagram of Fig. 6 illustrates the main steps in a method of decoding a parametric multi-channel surround audio bitstream as performed in a parametric sound decoder 13.
  • the demultiplexer 29 the main bitstream and the spatial side information are recovered.
  • the main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals i M (k,m) are input to the personalized spatial synthesis unit 17'.
  • the spatial side information holding the spatial parameters is from the demultiplexer 29 provided to a spatial parameter decoder 33 that produces the decoded parameters p N (k,m) .
  • the decoded spatial parameters are input to a parameter combining unit 37 that may also receive other parameter information, in particular personalized parameters and HRF information.
  • the combining unit produces new parameters that in particular may be personalized spatial parameters and are input to the synthesis unit 17'.
  • the spatial synthesis unit produces the signal x 2 (k,m) that is provided to the signal F/T transform unit 35 transforming back into the time domain.
  • the time domain signal is provided to e.g. the ear-phones 39 of a mobile terminal 41 in which the parametric surround decoder is running.
  • the additional information and parameters received by the combining unit 37 can be obtained from a parameter unit 43 that e.g. may be constructed to receive user input interactively during a listening session such as from depressing some suitable key of the mobile terminal or unit 41.
  • the processing in the MPEG surround decoder can be defined by two matrix multiplications as illustrated in the diagram of Fig. 7, the multiplications shown as including matrix units Ml and M2, also called the predecorrelator matrix unit and the mix matrix unit, respectively, to which the respective signals are input.
  • the first matrix multiplication forms the input signals to decorrelation units or decorrelators D 1 , D 2 , ..., and the second matrix multiplication forms the output signals based on the down-mix input and the output from the decorrelators.
  • the above operations are done for each hybrid subband, indexed by the hybrid subband index L
  • the index n is used for a number of a time slot
  • k is used to index a hybrid subband
  • / is used to index the parameter set.
  • M" 1 * is a two-dimensional matrix mapping a certain number of input channels to a certain number of channels going into the decorrelators, and is defined for every time-slot n and every hybrid subband Jc
  • Mf is a two-dimensional matrix mapping a certain number of pre- processed channels to a certain number of output channels and is defined for every time-slot n and every hybrid subband k.
  • the matrix M!, ⁇ ' comes in two versions depending on whether time- domain temporal shaping (TP) or temporal envelope shaping (TES) of the decorrelated signal is used, the two versions denoted Ml' k wa and MIJ' ⁇ .
  • the input vector x n ' k to the first matrix unit Ml corresponds to the decoded signals n k z M (k,m) of Fig. 6 obtained from the audio decoder 31.
  • the vector W ' that is input to the mix matrix unit M2 is a combination of the output d ls d 2 , ... from the decorrelators D 1 , D 2 , ..., the output from first matrix multiplication, i.e. from the predecorrelator matrix unit M 1 , and residual signals res ls res 2 , ..., and is defined for every time-slot n and every hybrid subband k.
  • the output vector y"' k has components I f , l s , r f , r s , cf and lfe that basically correspond to the signals L, SL, R, SR, C and LFE as described above.
  • the components must be transformed to the time domains and in some way be rendered to be provided to used earphones, i.e. they cannot be directly used.
  • a method for 3D audio rendering and in particular personalized decoding uses a decoder that includes a "Reconstruct from Model” block that takes extra input such as a representation of the personal 3D audio filters in the hybrid filter-bank domain and uses it to transform derivatives of the model parameters to other model parameters that allows generating the two binaural signals directly in the transform domain, so that only the binaural 2-channel signal has to be transformed into the discrete time domain, compare the transform unit 35 in Fig. 6.
  • a third matrix M ⁇ symbolically shown as the parameter modification matrix M3, is in this example a linear mapping from 6 channels to two channels, which are used as input to the user headphones 39 through the transform unit 35.
  • the matrix multiplication can be written as
  • Additional binaural post-processing may also be done and is outside the scope of the method as described herein. This may include further post-processing of the left and right channels.
  • the new mix matrix M ⁇ has parameters that depend both on the bit-stream parameters and the user predefined head related filters HRFs and as well on other dynamic rendering parameters if desired.
  • the matrix M?'* can be written as rn,k HH)C) H c B (k) Hf (Jc) Hf (Jc) H c (k) H c (k)
  • the matrix elements being the five different filters which are used to implement the head related filtering and as above are denoted Hf , H ⁇ , H c , Hf and H C F .
  • the filters are represented in the hybrid domain.
  • Such an operation to represent filters from the time domain to the frequency or transform domain are well known in the signal processing literature.
  • the filters that form the matrix M 3 '* are functions of the hybrid subband index k and are similar to those illustrated in Fig. 1.
  • the matrix M 3 1 '* is independent of the time slot index n. Head related filters might also be changed dynamically if the user wants another virtual loudspeaker configuration to be experienced through the headphones 39.
  • the user may want to interactively change his spatial position.
  • the user may want to experience how it is to be close to the concert scene if for instance a live concert is played, or farther away.
  • This could be easily implemented by adding delay lines to the parameter modification matrix M 3 1 '* .
  • the user action may be dynamic and in that case, the matrix M"'* is dependent on the time slot index n.
  • the user may want to experience different spatial sensations.
  • reverberation and other sound effects can be efficiently introduced in the matrix M 3 1 '* .
  • the parameter modification matrix M 3 ' fc can contain additional rendering parameters that are interactive and are changed in response to user input.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

A parametric multi-channel surround audio bitstream is received in a multi-channel decoder (13). The received spatial parameters are in combining unit (37) transformed into a new set of spatial parameters that are used in order to obtain a decoding of the multi-channel surround sound that is not a simple equivalent of the original input multi-channel surround signal but e.g. may be personalized by making the transformation based on a representation of user head related filters obtained from a unit (43). Such personalized spatial parameters may also be obtained by combining the received spatial parameters and a representation of the user head related filters with a set of additional rendering parameters that for example are interactively determined by the user and thus are time dependent.

Description

PERSONALIZED DECODING OF MULTI-CHANNEL SURROUND SOUND RELATED APPLICATION
This application claims priority and benefit from U.S. provisional patent application No. 60/743,096, filed January 5, 2006, the entire teachings of which are incorporated herein by reference.
TECHNICAL FIELD
The present invention is related to decoding a multi-channel surround audio bitstream. BACKGROUND
In film theaters around the world, multi-channel surround audio systems have since long placed film audiences in the center of the audio spaces of the film scenes that are being played before them and are giving them a realistic and convincing feeling of "being there". This audio technology has moved into the homes of ordinary people as home surround sound theatre systems and is now providing them with the sense of "being there" in their own living rooms.
The next field where this audio technology will be used includes mobile wireless units or terminals, in particular small units such as cellular telephones and PDAs. There the immersive nature of the surround sound is even more important because of the small sizes of the displays.
Moving this technology to mobile units is, however, not a trivial matter. The main obstacles include that:
1. The available bit-rate is in many cases low in wireless mobile channels. 2. The processing power of mobile terminals is often limited.
3. Small mobile terminals generally have only two micro speakers and earplugs or headphones.
This means, in particular for mobile terminals such as cellular telephones, that a surround sound solution for a mobile terminal has to use a much lower bit rate than the 384 kbits/s used in the Dolby Digital 5.1 system. Due to the limited processing power, the decoders of the mobile terminals must be computationally optimized and due to the speaker configuration of the mobile terminal, the surround sound must be delivered through the earplugs or headphones.
A standard way of delivering multi-channel surround sound through headphones or earplugs is to perform a 3D audio or binaural rendering of each of the speaker signals.
In general, in 3D audio rendering a model of the audio scene is used and each incoming monophonic signal is filtered through a set of filters that model the transformations created by the human head, torso and ears. These filters are called head related filters (HRFs) having head related transfer functions (HRTFs) and if appropriately designed, they give a good 3D audio scene perception.
The diagram of Fig. 1 illustrates a method of complete 3D audio rendering of an audio signal according to the Dolby Digital 5.1 system. The six multi-channel signals according to the Dolby Digital 5.1 system are: - surround right (SR), - right (R),
- center (C),
- low frequency (LFE), 5 - left (L)
- surround left (SL).
In the example illustrated in Fig. 1 the center and low frequency signals are combined into one signal. Then, five different filters Hf , Hf ,Hc , Hf and Hc r are needed in order to implement this method of head related filtering. The SR signal is input to filters Hf and Hf. , the o R signal is input to filters Hf and Hf. , the C and LFE signals are jointly input to filter Hc , the L signal is input to filters Hf. and Hf and the SL signal is input to filters Hf. and Hf . The signals output from the filters Hf , Hf , Hc , Hf. and Hf, are summed in a right summing element IR to give a signal intended to be provided to the right headphone, not shown. The signals output from the filters Hf. , Hf, , Hc , Hf and Hf are summed in a left summing 5 element IL to give a signal intended to be provided to the left headphone, not shown.
The quality in terms of 3D perception of such rendering depends on how closely the HRFs model or represent the listener's own head related filtering when she/he is listening. Hence, it may be advantageous if the HRFs can be adapted and personalized for each listener if a good or very good quality is desired. This adaptation and personalization step may include modeling, measure- 0 ment and in general a user dependent tuning in order to refine the quality of the perceived 3D audio scene.
Current state-of-the-art standardized multi-channel audio codecs require a high amount of bandwidth or a high bit-rate in order to reach an acceptable quality, and thus they prohibit the use of such codecs for services such as wireless mobile streaming. 5 For instance, even if the Dolby Digital 5.1 system (AC-3 codec) has very low complexity when compared to an AAC multi-channel codec, it requires a much higher bit-rate for similar quality. Both codecs, the AAC multi-channel codec and the AC-3 codec, remain until today unusable in the wireless mobile domain because of the high demands the high demands that they make on computational complexity and bitrate. 0 New parametric multi-channel codecs based on the principles of binaural cue coding have been developed. The recently standardized parametric stereo tool is a good example of the low complexity/high quality parametric technique for encoding stereophonic sound. The extension of parametric stereo to multi-channel coding is currently under standardization in MPEG under the name Spatial Audio coding, and is also known as MPEG-surround. The principles of parametric multi-channel coding can be explained and understood from the block diagram of Fig. 2 that illustrates a general case. A parametric surround encoder 3, also called a multi-channel parametric surround encoder, receives a multichannel, composite audio signal comprising the individual signals X1 (ή) to xN (ή) , where N is the number of input channels. For a Dolby Digital 5.1 surround system N = 6 as stated above. The encoder 3 then forms in down-mixing unit 5 a composite down-mixed signal comprising the individual down-mixed signals Z1(H) to zM(ή) . The number M of down-mixed channels (M < N) is dependent upon the required or allowable maximum bit-rate, the required quality and the availability of an M-channel audio encoder 7. One key aspect of the encoding process is that the down-mixed composite signal, typically a stereo signal but it could also be a mono signal, is derived from the multichannel input signal, and it is this down-mixed composite signal that is compressed in the audio encoder 7 for transmission over the wireless channel 9 rather than the original multi-channel signal. The parametric encoder 3 and in particular down-mixing unit 5 thereof may be capable of performing a down-mixing process, such that it creates a more or less true equivalent of the multi-channel signal, in the mono or stereo down-mixing. The parametric surround encoder also comprises a spatial parameter estimation unit 9 that from the input signals X1(Ji) to xN(n) computes the cues or spatial parameters that in some way can be said to describe the down- mixing process or the assumptions made therein. The compressed audio signal which is output from the M-channel audio encoder and also is the main signal is together with the spatial parameters that constitute side information transmitted over an interface 11 such as a wireless interface to the receiving side that in the case considered here typically is a mobile terminal.
Alternatively, the down-mixing could be supplied by some external unit, such as from a unit employing Artistic Downmix.
On the receiving side, a complementary parametric surround decoder 13 includes an audio decoder 15 and should be constructed to be capable of creating the best possible multi-channel decoding based on knowledge of the down-mixing algorithm used on the transmitting side and the encoded spatial parameters or cues that are received in parallel to the compressed multichannel signal. The audio decoder 15 produces signals Z1(H) to zM(ή) that should be as similar as possible to the signals Z1(Ii) to zM(n) on the transmitting side. These are together with the spatial parameters input to a spatial synthesis unit 17 that produces output signals X1Qi) to xN(ή) that should be as similar as possible to the original input signals X1(U) to xN(ή) on the transmitting side. The output signals X1 (ή) to xN (ή) can be input to a binaural rendering system such as that shown in Fig. 1.
It is obvious, that depending on the bandwidth of the transmitting channel over the interface 11 that generally is relatively low there will be a loss of information and hence the signals Z1(H) to zM (n) and X1 (ή) to xN (ή) on the receiving side cannot be the same as their counterparts on the transmitting side. Even though they are not quite true equivalents of their counter parts, they may be sufficiently good equivalents. In general, such a surround encoding process is independent of the compression algorithm used for the transmitted channels used in the units audio encoder 7 and audio decoder 15 in Fig.
2. The encoding process can use any of a number of high-performance compression algorithms such as AMR-WB+, MPEG-I Layer III, MPEG-4 AAC or MPEG-4 High Efficiency AAC, and it could even use PCM. In general, the above operations are done in the transformed signal domain, such as Fourier transform or MDCT. This is especially beneficial if the spatial parameter estimation and synthesis in the units 9 and 17 use the same type of transform as that used in the audio encoder 7, also called core codec.
Fig. 3 is a detailed block diagram of an efficient parametric audio encoder. The N-channel discrete time input signal, denoted in vector form as xN(n) , is first transformed to the frequency domain in a transform unit 21, and in general to a transform domain that gives a signal x^ (k, m) .
The index k is the index of the transform coefficients, or sub-bands if a frequency domain transform is chosen. The index m represents the decimated time domain index that is also related to the input signal possibly through overlapped frames. The signal is thereafter down-mixed in a down-mixing unit 5 to generate the M-channel downmix signal zM (k,m) , where M < N. A sequence of spatial model parameter vectors pN(k, m) is estimated in an estimation unit 9. This can be either done in an open-loop or closed loop fashion.
Spatial parameters consist of psycho-acoustical cues that are representative of the surround sound sensation. For instance, in the MPEG surround encoder, these parameters consist of inter- channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal relative to a transmitted down-mixed signal zM(k,m) (or if in closed loop, the decoded signal xM(k,m)). The cues pN(k,m) can be encoded in a very compact form such as in a spatial parameter quantization unit 23 producing the signal pN(k,7n) followed by a spatial parameter encoder 25. The M-channel audio encoder 7 produces the main bitstream which in a multiplexer 27 is multiplexed with the spatial side information produced by the parameter encoder. From the multiplexer the multiplexed signal is transmitted to demultiplexer 29 on the receiving side in which the side information and the main bitstream are recovered as seen in the block diagram of Fig. 4. On the receiving side the main bitstream is decoded to synthesize a high quality multichannel representation using the received spatial parameters. The main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals xM(k,m) are input to the spatial synthesis unit 17. The spatial side information holding the spatial parameters is extracted by the demultiplexer 29 and provided to a spatial parameter decoder 33 that produces the decoded parameters pN(k,m) and transmits them to the synthesis unit 17. The spatial synthesis unit produces the signal xN(k,m) , that is provided to the signal F/T transform unit 35 transforming into the time domain to produce the signal \N(n) , i.e. the multichannel decoded signal. A 3D audio rendering of a multi-channel surround sound can be delivered to a mobile terminal user by using an efficient parametric surround decoder to first obtain the multiple surround sound channels, using for instance the multi-channel decoder described above with reference to Fig. 4. Thereupon, the system illustrated in Fig. 1 is used to synthesize a binaural 3D-audio rendered multichannel signal. This operation is shown in the schematic of Fig. 5. Work has also been done in which spatial or 3D audio filtering has been performed in the subband domain. In CA. Lanciani, and R. W. Schafer, "Application of Head-related Transfer Functions to MPEG Audio Signals", Proc. 31st Symposium on System Theory, March 21-23, 1999, Auburn, AL, U.S.A., it is disclosed how an MPEG coded mono signal could be spatialized by performing the HR filtering operation in the subband domain, m A.B. Touimi, M. Emerit and J.M. Pernaux, "Efficient Method for Multiple Compressed Audio Streams Spatialization," Proc. 3rd International Conference on Mobile and Ubiquitous Multimedia, pp. 229-235, October 27-29, 2004, College Park, Maryland, U.S.A., it is disclosed how a number of individually MPEG coded mono signals can be spatialized by doing the HR filtering operations in the subband domain. The solution is based on a special implementation of the HR filters, in which all HR filters are modeled as a linear combination of a few predefined basis filters.
Applications of 3D audio rendering are multiple and include gamming, mobile TV shows, using standards such as 3GPP MBMS or DVB-H, listening to music concerts, watching movies and in general multimedia services, which contain a multi-channel audio component.
The methods described above of rendering multi-channel surround sound, although attractive since they allow a whole new set of services to be provided to wireless mobile units, have many drawbacks:
First of all, the computational demands of such rendering are prohibitive since both decoding and 3D rendering have to be performed in parallel and in real time. The complexity of a parametric multi-channel decoder even if low when compared to a full waveform multi-channel decoder is still quite high and at least higher than that of a simple stereo decoder. The synthesis stage of spatial decoding has a complexity that is at least proportional to the number of encoded channels. Additionally, the filtering operations of 3D rendering are also proportional to the number of channels.
The second disadvantage consists of the temporary memory that is needed in order to store the intermediate decoded channels. They are in fact buffered since they are needed in the second stage of 3D rendering.
Finally, the possible post-processing steps that usually are part of speech and audio codecs may affect the quality of such 3D audio rendering. These post-processing are beneficial for listening in a loudspeaker environment. However, they may introduce severe nonlinear phase distortion that is unequally distributed over the multiple channels and that may impact the 3D audio rendering quality.
SUMMARY
It is an object of the invention to provide an efficient and versatile method of decoding a parametric multi-channel surround audio bitstream. It is another object of the invention to provide a mobile terminal in which a parametric multi-channel surround audio bitstream can be efficiently decoded to produce a signal or signals suitable for being provided to listening equipment in or connected to the mobile terminal.
In a method of decoding a parametric multi-channel surround audio bitstream concepts such as decoding multi-channel surround sound and in particular binaural decoding multi-channel surround sound are used.
In such a method the spatial parameters received by a parametric multi-channel decoder may be transformed into a new set of spatial parameters that are used in order to obtain a different decoding of multi-channel surround sound.
The transformed parameters may also be personalized spatial parameters and can then be obtained by combining both the received spatial parameters and a representation of user head related filters.
The personalized spatial parameters may also be obtained by combining the received spatial parameters and a representation of the user head related filters and a set of additional rendering parameters determined by the user. A subset of the set of additional rendering parameters may be interactive parameters that are set in response to user choices that may be changed during the listening process.
The set of additional rendering parameters may be time dependent parameters.
The method as described herein may allow a simple and efficient way to render surround sound, which is encoded by parametric encoders on mobile devices. The major advantage consists of a reduced complexity and an increased interactivity when listening through headphones using a mobile device. Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention maybe realized and obtained by means of the methods, processes, instrumentalities and combinations particularly pointed out in the appended claims.
BRIEF DESCRIP TION OF THE DRAWINGS
While the novel features of the invention are set forth with particularly in the appended claims, a complete understanding of the invention, both as to organization and content, and of the above and other features thereof may be gained from and the invention will be better appreciated from a consideration of the following detailed description of non-limiting embodiments presented hereinbelow with reference to the accompanying drawings, in which:
- Fig. 1 is a block diagram illustrating a possible 3D audio or binaural rendering of a 5.1 audio signal,
- Fig. 2 is a high-level description of the principles of a parametric multi-channel coding and decoding system,
- Fig. 3 is a detailed description of the parametric multi-channel audio encoder
- Fig. 4 is a detailed description of the parametric multi-channel audio decoder,
- Fig. 5 is 3D-audio rendering of decoded multi-channel signal (Prior- Art),
- Fig. 6 is a personalized binaural decoding of multi-channel surround sound, - Fig. 7 is a generalized diagram of the spatial audio processing in the MPEG-surround decoder,
- Fig. 8 is an embodiment of the invention for personalized binaural decoding,
- Fig. 9 is a schematic illustrating combining parameters, and
- Fig. 10 is a diagram illustrating results of listening test.
DETAILED DESCRIPTION The block diagram of Fig. 6 illustrates the main steps in a method of decoding a parametric multi-channel surround audio bitstream as performed in a parametric sound decoder 13. In the demultiplexer 29 the main bitstream and the spatial side information are recovered. The main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals iM(k,m) are input to the personalized spatial synthesis unit 17'. The spatial side information holding the spatial parameters is from the demultiplexer 29 provided to a spatial parameter decoder 33 that produces the decoded parameters pN(k,m) . The decoded spatial parameters are input to a parameter combining unit 37 that may also receive other parameter information, in particular personalized parameters and HRF information. The combining unit produces new parameters that in particular may be personalized spatial parameters and are input to the synthesis unit 17'. The spatial synthesis unit produces the signal x2(k,m) that is provided to the signal F/T transform unit 35 transforming back into the time domain. The time domain signal is provided to e.g. the ear-phones 39 of a mobile terminal 41 in which the parametric surround decoder is running. The additional information and parameters received by the combining unit 37 can be obtained from a parameter unit 43 that e.g. may be constructed to receive user input interactively during a listening session such as from depressing some suitable key of the mobile terminal or unit 41.
The method as embodied in an MPEG surround multi-channel decoder, see Text of ISO/IEC 14496-3:200X/PDAM 4, MPEG Surround, N7530, October 2005, Nice, France, will now be described. However, it is obvious that the method can equally well be used in other contexts.
The processing in the MPEG surround decoder can be defined by two matrix multiplications as illustrated in the diagram of Fig. 7, the multiplications shown as including matrix units Ml and M2, also called the predecorrelator matrix unit and the mix matrix unit, respectively, to which the respective signals are input. The first matrix multiplication forms the input signals to decorrelation units or decorrelators D1, D2, ..., and the second matrix multiplication forms the output signals based on the down-mix input and the output from the decorrelators. The above operations are done for each hybrid subband, indexed by the hybrid subband index L
In the following, the index n is used for a number of a time slot, k is used to index a hybrid subband, and / is used to index the parameter set. The processing of the input channels to form the output channels can then be described as:
where M"1* is a two-dimensional matrix mapping a certain number of input channels to a certain number of channels going into the decorrelators, and is defined for every time-slot n and every hybrid subband Jc, and Mf is a two-dimensional matrix mapping a certain number of pre- processed channels to a certain number of output channels and is defined for every time-slot n and every hybrid subband k. The matrix M!,α' comes in two versions depending on whether time- domain temporal shaping (TP) or temporal envelope shaping (TES) of the decorrelated signal is used, the two versions denoted Ml'k wa and MIJ'^ .
The input vector xn'k to the first matrix unit Ml corresponds to the decoded signals n k zM(k,m) of Fig. 6 obtained from the audio decoder 31. The vector W ' that is input to the mix matrix unit M2 is a combination of the output dls d2, ... from the decorrelators D1, D2, ..., the output from first matrix multiplication, i.e. from the predecorrelator matrix unit M1, and residual signals resls res2, ..., and is defined for every time-slot n and every hybrid subband k. The output vector y"'k has components If, ls, rf, rs, cf and lfe that basically correspond to the signals L, SL, R, SR, C and LFE as described above. The components must be transformed to the time domains and in some way be rendered to be provided to used earphones, i.e. they cannot be directly used.
A method for 3D audio rendering and in particular personalized decoding uses a decoder that includes a "Reconstruct from Model" block that takes extra input such as a representation of the personal 3D audio filters in the hybrid filter-bank domain and uses it to transform derivatives of the model parameters to other model parameters that allows generating the two binaural signals directly in the transform domain, so that only the binaural 2-channel signal has to be transformed into the discrete time domain, compare the transform unit 35 in Fig. 6.
An embodiment for personalized binaural decoding based on the MPEG surround is illustrated in the diagram of Fig. 8. A third matrix M^ , symbolically shown as the parameter modification matrix M3, is in this example a linear mapping from 6 channels to two channels, which are used as input to the user headphones 39 through the transform unit 35. The matrix multiplication can be written as
Additional binaural post-processing may also be done and is outside the scope of the method as described herein. This may include further post-processing of the left and right channels.
By linearity (associative law) it is clear that the matrices Mj'* and M3'* can be combined together to form a new set of parameters stored in a new mix matrix M^'* = M^4Mj'* : . This combining operation is illustrated in Fig. 9, where the multiplication unit corresponding to the new matrix is shown as the mix matrix unit M4 and the multiplication of the two matrices is made in a multiplying unit 45.
The new mix matrix M^ has parameters that depend both on the bit-stream parameters and the user predefined head related filters HRFs and as well on other dynamic rendering parameters if desired.
For the case of head related filters only, the matrix M?'* can be written as rn,k HH)C) Hc B(k) Hf (Jc) Hf (Jc) Hc(k) Hc (k)
MΓ = Hf (k) Hf Qc) Hc r(k) Hc B(k) Hc(k) Hc (k\
the matrix elements being the five different filters which are used to implement the head related filtering and as above are denoted Hf , H^ , Hc , Hf and HC F . In this case the filters are represented in the hybrid domain. Such an operation to represent filters from the time domain to the frequency or transform domain are well known in the signal processing literature. Here the filters that form the matrix M3'* are functions of the hybrid subband index k and are similar to those illustrated in Fig. 1.
It should be noted that for this simple case the matrix M3 1'* is independent of the time slot index n. Head related filters might also be changed dynamically if the user wants another virtual loudspeaker configuration to be experienced through the headphones 39.
Ln another embodiment, the user may want to interactively change his spatial position. By this it is meant that the user may want to experience how it is to be close to the concert scene if for instance a live concert is played, or farther away. This could be easily implemented by adding delay lines to the parameter modification matrix M3 1'* . The user action may be dynamic and in that case, the matrix M"'* is dependent on the time slot index n.
In yet another embodiment the user may want to experience different spatial sensations. In this case, reverberation and other sound effects can be efficiently introduced in the matrix M3 1'* .
The dynamic nature of the matrix M3'* related to the user interactivity could benefit from interpolation between two user actions. Methods of parameter interpolation are well known and are not be described herein.
As already stated, the parameter modification matrix M3'fc can contain additional rendering parameters that are interactive and are changed in response to user input.
The particular embodiment of the invention described above has been implemented and tested as part of the MPEG standardization effort for a binaural extension of the MPEG surround decoder. The test results from several listening tests performed by independent groups are shown in the diagram of Fig. 10. There it clearly seen that the perceived quality of the binaural rendering from the particular embodiment of the invention is for most test signals better than that obtained from the standard 3D audio post processing method as shown in Figure 5. Although the embodiments described herein refer to decoding for binaural headphone listening, it is obvious to one skilled in the art that they can be applied also for loudspeaker listening and other spatial configurations without departing from the basic idea of parameter mapping and combination.
While specific embodiments of the invention have been illustrated and described herein, it is realized that numerous other embodiments may be envisaged and that numerous additional advantages, modifications and changes will readily occur to those skilled in the art without departing from the spirit and scope of the invention. Therefore, the invention in its broader aspects is not limited to the specific details, representative devices and illustrated examples shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents. It is therefore to be understood that the appended claims are intended to cover all such modifications and changes as fall within a true spirit and scope of the invention. Numerous other embodiments may be envisaged without departing from the spirit and scope of the invention.

Claims

1. A method of decoding a parametric multi-channel surround audio bitstream received by a parametric multi-channel decoder including the steps of:
- demultiplexing said bitstream to form a main bitstream and spatial side information, - decoding the spatial side information to form a first set of spatial parameters,
- modifying the first set of spatial parameters to form a second set of spatial parameters,
- synthesizing from said main bitstream, based on or using the second set of spatial parameters, a surround audio signal to be provided to listening equipment.
2. A method according to claim 1, characterized in that in the step of modifying, the second set of spatial parameters are obtained by combining the first set of spatial parameters and a representation of user head related filters so that the new parameters are personalized and also the surround audio signal is personalized.
3. A method according to claim 2, characterized in that in the step of combining, the received spatial parameters and a representation of user head related filters are also combined with additional rendering parameters determined by the user.
4. A method according to claim 3, characterized in that the additional rendering parameters are interactive parameters set in response to user choices.
5. A method according to claim 3, characterized in that the additional rendering parameters are time dependent.
6. A method of transmitting digital data representing sound to a mobile unit, the digital data including a first number (N) of first channels, each first channel in particular representing sound having a special characteristic, such as sound received from a particular direction and being in a particular frequency band, the method comprising the steps:
- analyzing said digital data to determine parameters characteristic of the sound, the parameters in particular being determined to represent a spatial relationship between the sounds which are represented by the digital data in each of the first channels,
- down-mixing digital data of the first channels with one another to produce digital data in a second number (M) of second channels, the second number being smaller than the first number (M< N), - transmitting wirelessly the digital data in the second channels and the parameters to a mobile unit,
- receiving in the mobile unit the digital data in the second channels and the parameters,
- transforming the received digital data in the second channels, based on the received parameters, to produce transformed digital data suited to be rendered to sound emitters of the mobile unit, and - rendering the transformed digital data to the sound emitters of the mobile unit, characterized by the additional step of modifying, before the step of transforming, the received parameters to form new parameters that are used in the transforming step.
7. A parametric surround decoder for decoding a parametric multi-channel surround audio bitstream, the bitstream including spatial parameters indicating the character of sound represented in the channels of the bitstream received by the decoder, characterized by a modifying unit for modifying the spatial parameters to form new spatial parameters used in synthesizing so that a different decoding of the original multi-channel surround sound is obtained.
8. A parametric surround decoder according to claim 7, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, a representation of user head related filters so that the new parameters are personalized and also a resulting surround audio signal is personalized.
9. A parametric surround decoder according to claim 8, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, also additional rendering parameters determined by the user.
10. A parametric surround decoder according to claim 7, characterized in that the modifying unit is arranged to modify the spatial parameters in a time dependent way.
11. A mobile terminal including a parametric surround decoder for decoding a parametric multi-channel surround audio bitstream received by the mobile unit, the bitstream including spatial parameters indicating the character of sound represented in channels of the received bitstream decoder, characterized in that the parametric surround decoder includes a modifying unit for modifying the spatial parameters to form new spatial parameters used in synthesizing so that a different decoding of the original multi-channel surround sound is obtained.
12. A mobile terminal according to claim 11, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, a representation of user head related filters so that the new parameters are personalized and also a resulting surround audio signal is personalized.
13. A mobile terminal according to claim 12, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, also additional rendering parameters determined by the user or input from the user, such as by depressing one or more keys of the mobile unit.
14. A mobile terminal according to claim 12, characterized in that the modifying unit is arranged to modify the spatial parameters interactively in accordance with input from the user.
15. A mobile terminal according to claim 7, characterized in that the modifying unit is arranged to modify the spatial parameters in a time dependent way.
16. A method of decoding a parametric multi-channel surround audio bitstream including a first number (N) of audio channels, said bitstream received by a parametric multi-channel decoder, the method including the steps of: - demultiplexing said bitstream to form a main bitstream and spatial side information,
- decoding the main bitstream to form separate bitstreams for said plurality of audio channels,
- decoding the spatial side information to form a first set of spatial parameters,
- synthesizing from said separate bitstreams, based on or using the first set of spatial parameters, surround audio signals in a second number (M) of audio channels suited to be provided to listening equipment, wherein the second number (M) is smaller than the first number (N).
17. A method according to claim 16, characterized in that the second number (M) is equal to 2.
18. A method according to claim 16, characterized in that the first number (N) is equal to 5 or 6.
PCT/SE2007/000006 2006-01-05 2007-01-05 Personalized decoding of multi-channel surround sound WO2007078254A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07701092A EP1969901A2 (en) 2006-01-05 2007-01-05 Personalized decoding of multi-channel surround sound
BRPI0706285-0A BRPI0706285A2 (en) 2006-01-05 2007-01-05 methods for decoding a parametric multichannel surround audio bitstream and for transmitting digital data representing sound to a mobile unit, parametric surround decoder for decoding a parametric multichannel surround audio bitstream, and, mobile terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US74309606P 2006-01-05 2006-01-05
US60/743,096 2006-01-05

Publications (2)

Publication Number Publication Date
WO2007078254A2 true WO2007078254A2 (en) 2007-07-12
WO2007078254A3 WO2007078254A3 (en) 2007-08-30

Family

ID=38228634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2007/000006 WO2007078254A2 (en) 2006-01-05 2007-01-05 Personalized decoding of multi-channel surround sound

Country Status (5)

Country Link
EP (1) EP1969901A2 (en)
CN (1) CN101433099A (en)
BR (1) BRPI0706285A2 (en)
RU (1) RU2008132156A (en)
WO (1) WO2007078254A2 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2436736A (en) * 2006-03-31 2007-10-03 Sony Corp Sound field correction system for performing correction of frequency-amplitude characteristics
US20090144063A1 (en) * 2006-02-03 2009-06-04 Seung-Kwon Beack Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue
EP2154911A1 (en) * 2008-08-13 2010-02-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. An apparatus for determining a spatial output multi-channel audio signal
EP2175670A1 (en) * 2008-10-07 2010-04-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Binaural rendering of a multi-channel audio signal
US7746964B2 (en) 2005-12-13 2010-06-29 Sony Corporation Signal processing apparatus and signal processing method
US8199932B2 (en) 2006-11-29 2012-06-12 Sony Corporation Multi-channel, multi-band audio equalization
US8280075B2 (en) 2007-02-05 2012-10-02 Sony Corporation Apparatus, method and program for processing signal and method for generating signal
WO2015080994A1 (en) * 2013-11-27 2015-06-04 Dolby Laboratories Licensing Corporation Audio signal processing
WO2016003206A1 (en) * 2014-07-01 2016-01-07 한국전자통신연구원 Multichannel audio signal processing method and device
US9805727B2 (en) 2013-04-03 2017-10-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US9848272B2 (en) 2013-10-21 2017-12-19 Dolby International Ab Decorrelator structure for parametric reconstruction of audio signals
US9883308B2 (en) 2014-07-01 2018-01-30 Electronics And Telecommunications Research Institute Multichannel audio signal processing method and device
US9900692B2 (en) 2014-07-09 2018-02-20 Sony Corporation System and method for playback in a speaker system
US9933989B2 (en) 2013-10-31 2018-04-03 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
WO2018154175A1 (en) * 2017-02-17 2018-08-30 Nokia Technologies Oy Two stage audio focus for spatial audio processing
US10410644B2 (en) 2011-03-28 2019-09-10 Dolby Laboratories Licensing Corporation Reduced complexity transform for a low-frequency-effects channel
CN110797037A (en) * 2013-07-31 2020-02-14 杜比实验室特许公司 Method and apparatus for processing audio data, medium, and device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556799B (en) * 2009-05-14 2013-08-28 华为技术有限公司 Audio decoding method and audio decoder
US9584912B2 (en) 2012-01-19 2017-02-28 Koninklijke Philips N.V. Spatial audio rendering and encoding
WO2016089133A1 (en) * 2014-12-04 2016-06-09 가우디오디오랩 주식회사 Binaural audio signal processing method and apparatus reflecting personal characteristics
EP3220668A1 (en) * 2016-03-15 2017-09-20 Thomson Licensing Method for configuring an audio rendering and/or acquiring device, and corresponding audio rendering and/or acquiring device, system, computer readable program product and computer readable storage medium
CN106373582B (en) * 2016-08-26 2020-08-04 腾讯科技(深圳)有限公司 Method and device for processing multi-channel audio
EP4138396A4 (en) * 2020-05-21 2023-07-05 Huawei Technologies Co., Ltd. Audio data transmission method, and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030035553A1 (en) * 2001-08-10 2003-02-20 Frank Baumgarte Backwards-compatible perceptual coding of spatial cues
WO2005041447A1 (en) * 2003-10-22 2005-05-06 Unwired Technology Llc Multiple channel wireless communication system
WO2007004830A1 (en) * 2005-06-30 2007-01-11 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
WO2007031896A1 (en) * 2005-09-13 2007-03-22 Koninklijke Philips Electronics N.V. Audio coding
US20070094037A1 (en) * 2005-08-30 2007-04-26 Pang Hee S Slot position coding for non-guided spatial audio coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030035553A1 (en) * 2001-08-10 2003-02-20 Frank Baumgarte Backwards-compatible perceptual coding of spatial cues
WO2005041447A1 (en) * 2003-10-22 2005-05-06 Unwired Technology Llc Multiple channel wireless communication system
WO2007004830A1 (en) * 2005-06-30 2007-01-11 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
US20070094037A1 (en) * 2005-08-30 2007-04-26 Pang Hee S Slot position coding for non-guided spatial audio coding
WO2007031896A1 (en) * 2005-09-13 2007-03-22 Koninklijke Philips Electronics N.V. Audio coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: 'Model-based HRTF parameter interpolation' IP.COM JOURNAL, IP.COM INC., WEST HENRIETTA, NY, US 05 September 2006, pages 1 - 5, XP003012896 *
See also references of EP1969901A2 *

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7746964B2 (en) 2005-12-13 2010-06-29 Sony Corporation Signal processing apparatus and signal processing method
US10277999B2 (en) 2006-02-03 2019-04-30 Electronics And Telecommunications Research Institute Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue
US20120294449A1 (en) * 2006-02-03 2012-11-22 Electronics And Telecommunications Research Institute Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue
US9426596B2 (en) * 2006-02-03 2016-08-23 Electronics And Telecommunications Research Institute Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue
US20090144063A1 (en) * 2006-02-03 2009-06-04 Seung-Kwon Beack Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue
GB2436736A (en) * 2006-03-31 2007-10-03 Sony Corp Sound field correction system for performing correction of frequency-amplitude characteristics
GB2436736B (en) * 2006-03-31 2008-03-12 Sony Corp Signal processing apparatus, signal processing method, and sound field correction system
US8150069B2 (en) 2006-03-31 2012-04-03 Sony Corporation Signal processing apparatus, signal processing method, and sound field correction system
US8199932B2 (en) 2006-11-29 2012-06-12 Sony Corporation Multi-channel, multi-band audio equalization
US8280075B2 (en) 2007-02-05 2012-10-02 Sony Corporation Apparatus, method and program for processing signal and method for generating signal
CN102165797A (en) * 2008-08-13 2011-08-24 弗朗霍夫应用科学研究促进协会 An apparatus for determining a spatial output multi-channel audio signal
US8855320B2 (en) 2008-08-13 2014-10-07 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for determining a spatial output multi-channel audio signal
EP2154911A1 (en) * 2008-08-13 2010-02-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. An apparatus for determining a spatial output multi-channel audio signal
US8879742B2 (en) 2008-08-13 2014-11-04 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Apparatus for determining a spatial output multi-channel audio signal
US8824689B2 (en) 2008-08-13 2014-09-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for determining a spatial output multi-channel audio signal
EP2175670A1 (en) * 2008-10-07 2010-04-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Binaural rendering of a multi-channel audio signal
WO2010040456A1 (en) * 2008-10-07 2010-04-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Binaural rendering of a multi-channel audio signal
AU2009301467B2 (en) * 2008-10-07 2013-08-01 Dolby International Ab Binaural rendering of a multi-channel audio signal
US8325929B2 (en) 2008-10-07 2012-12-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Binaural rendering of a multi-channel audio signal
US10410644B2 (en) 2011-03-28 2019-09-10 Dolby Laboratories Licensing Corporation Reduced complexity transform for a low-frequency-effects channel
US11081118B2 (en) 2013-04-03 2021-08-03 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US11270713B2 (en) 2013-04-03 2022-03-08 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11769514B2 (en) 2013-04-03 2023-09-26 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11948586B2 (en) 2013-04-03 2024-04-02 Dolby Laboratories Licensing Coporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10553225B2 (en) 2013-04-03 2020-02-04 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11727945B2 (en) 2013-04-03 2023-08-15 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US10515644B2 (en) 2013-04-03 2019-12-24 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US9997164B2 (en) 2013-04-03 2018-06-12 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US10748547B2 (en) 2013-04-03 2020-08-18 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10832690B2 (en) 2013-04-03 2020-11-10 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US9881622B2 (en) 2013-04-03 2018-01-30 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US11568881B2 (en) 2013-04-03 2023-01-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10276172B2 (en) 2013-04-03 2019-04-30 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US9805727B2 (en) 2013-04-03 2017-10-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US10388291B2 (en) 2013-04-03 2019-08-20 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
CN110797037A (en) * 2013-07-31 2020-02-14 杜比实验室特许公司 Method and apparatus for processing audio data, medium, and device
US11736890B2 (en) 2013-07-31 2023-08-22 Dolby Laboratories Licensing Corporation Method, apparatus or systems for processing audio objects
US9848272B2 (en) 2013-10-21 2017-12-19 Dolby International Ab Decorrelator structure for parametric reconstruction of audio signals
US10255027B2 (en) 2013-10-31 2019-04-09 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10503461B2 (en) 2013-10-31 2019-12-10 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US11269586B2 (en) 2013-10-31 2022-03-08 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10838684B2 (en) 2013-10-31 2020-11-17 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US11681490B2 (en) 2013-10-31 2023-06-20 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US9933989B2 (en) 2013-10-31 2018-04-03 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US12061835B2 (en) 2013-10-31 2024-08-13 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10142763B2 (en) 2013-11-27 2018-11-27 Dolby Laboratories Licensing Corporation Audio signal processing
US20170026771A1 (en) * 2013-11-27 2017-01-26 Dolby Laboratories Licensing Corporation Audio Signal Processing
WO2015080994A1 (en) * 2013-11-27 2015-06-04 Dolby Laboratories Licensing Corporation Audio signal processing
DE112015003108B4 (en) * 2014-07-01 2021-03-04 Electronics And Telecommunications Research Institute Method and device for processing a multi-channel audio signal
US10645515B2 (en) 2014-07-01 2020-05-05 Electronics And Telecommunications Research Institute Multichannel audio signal processing method and device
US10264381B2 (en) 2014-07-01 2019-04-16 Electronics And Telecommunications Research Institute Multichannel audio signal processing method and device
US9883308B2 (en) 2014-07-01 2018-01-30 Electronics And Telecommunications Research Institute Multichannel audio signal processing method and device
WO2016003206A1 (en) * 2014-07-01 2016-01-07 한국전자통신연구원 Multichannel audio signal processing method and device
US9900692B2 (en) 2014-07-09 2018-02-20 Sony Corporation System and method for playback in a speaker system
KR102214205B1 (en) * 2017-02-17 2021-02-10 노키아 테크놀로지스 오와이 2-stage audio focus for spatial audio processing
US10785589B2 (en) 2017-02-17 2020-09-22 Nokia Technologies Oy Two stage audio focus for spatial audio processing
KR20190125987A (en) * 2017-02-17 2019-11-07 노키아 테크놀로지스 오와이 Two-stage audio focus for spatial audio processing
WO2018154175A1 (en) * 2017-02-17 2018-08-30 Nokia Technologies Oy Two stage audio focus for spatial audio processing

Also Published As

Publication number Publication date
CN101433099A (en) 2009-05-13
WO2007078254A3 (en) 2007-08-30
EP1969901A2 (en) 2008-09-17
RU2008132156A (en) 2010-02-10
BRPI0706285A2 (en) 2011-03-22

Similar Documents

Publication Publication Date Title
EP2000001B1 (en) Method and arrangement for a decoder for multi-channel surround sound
WO2007078254A2 (en) Personalized decoding of multi-channel surround sound
JP7564295B2 (en) Apparatus, method, and computer program for encoding, decoding, scene processing, and other procedures for DirAC-based spatial audio coding - Patents.com
Herre et al. MPEG-H 3D audio—The new standard for coding of immersive spatial audio
US8266195B2 (en) Filter adaptive frequency resolution
Engdegard et al. Spatial audio object coding (SAOC)—the upcoming MPEG standard on parametric object based audio coding
CN103489449B (en) Audio signal decoder, method for providing upmix signal representation state
KR101358700B1 (en) Audio encoding and decoding
US8880413B2 (en) Binaural spatialization of compression-encoded sound data utilizing phase shift and delay applied to each subband
US9219972B2 (en) Efficient audio coding having reduced bit rate for ambient signals and decoding using same
JP6134867B2 (en) Renderer controlled space upmix
CN111970629B (en) Audio decoder and decoding method
Breebaart et al. Spatial audio object coding (SAOC)-the upcoming MPEG standard on parametric object based audio coding
JP2009543142A (en) Concept for synthesizing multiple parametrically encoded sound sources
US10013993B2 (en) Apparatus and method for surround audio signal processing
CN112218229A (en) Method and apparatus for binaural dialog enhancement
Breebaart et al. Binaural rendering in MPEG Surround
Quackenbush et al. MPEG surround
WO2008084436A1 (en) An object-oriented audio decoder
Peters et al. Scene-based audio implemented with higher order ambisonics
Herre Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio
Plogsties et al. MPEG Sorround binaural rendering-Sorround sound for mobile devices (Binaurale Wiedergabe mit MPEG Sorround-Sorround sound fuer mobile Geraete)
Meng Virtual sound source positioning for un-fixed speaker set up
Breebaart et al. 19th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
EA047653B1 (en) AUDIO ENCODING AND DECODING USING REPRESENTATION TRANSFORMATION PARAMETERS

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2007701092

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 200780001908.2

Country of ref document: CN

NENP Non-entry into the national phase in:

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 6071/DELNP/2008

Country of ref document: IN

ENP Entry into the national phase in:

Ref document number: 2008132156

Country of ref document: RU

Kind code of ref document: A

ENP Entry into the national phase in:

Ref document number: PI0706285

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20080701