EP2860728A1 - Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen - Google Patents
Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen Download PDFInfo
- Publication number
- EP2860728A1 EP2860728A1 EP20130306391 EP13306391A EP2860728A1 EP 2860728 A1 EP2860728 A1 EP 2860728A1 EP 20130306391 EP20130306391 EP 20130306391 EP 13306391 A EP13306391 A EP 13306391A EP 2860728 A1 EP2860728 A1 EP 2860728A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- directions
- audio signal
- direction values
- frame
- quantised
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims description 32
- 230000005236 sound signal Effects 0.000 claims abstract description 47
- 238000012545 processing Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims 1
- 230000006835 compression Effects 0.000 abstract description 13
- 238000007906 compression Methods 0.000 abstract description 13
- 230000006837 decompression Effects 0.000 description 9
- 238000013459 approach Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the invention relates to a method and to an apparatus for encoding and for decoding directional side information for a 3D audio signal.
- HOA Higher Order Ambisonics
- WFS wave field synthesis
- channel based approaches like 22.2.
- HOA Higher Order Ambisonics
- the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up.
- HOA may also be rendered to set-ups consisting of only few loudspeakers.
- a further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
- HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion.
- SH Spherical Harmonics
- Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function.
- O denotes the number of expansion coefficients.
- HOA coefficient sequences or as HOA channels in the following.
- An HOA representation can be expressed as a temporal sequence of HOA data frames containing HOA coefficients.
- the compression of HOA sound field representations is proposed in patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 : these approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional component and a residual ambient component.
- the resulting compressed representation comprises of a number of quantised signals, resulting from the perceptual coding of the directional signals and relevant coefficient sequences of the ambient HOA component.
- the resulting compressed representation comprises additional side information related to the quantised signals, which side information is necessary for the reconstruction of the HOA representation from its compressed version.
- a problem to be solved by the invention is to further improve the compression of HOA representations. This problem is solved by the methods disclosed in claims 1 and 10. Apparatuses utilising these methods are disclosed in claims 2 and 11.
- the invention deals with the coding of the side information related to the directional component, which additional compression is not addressed in the above-mentioned patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 .
- EP 12305537.8 in order to efficiently code or compress a given HOA representation, it is analysed on a frame-by-frame basis and is decomposed into a directional component and a residual ambient component, whereby at compressor side the direction values are estimated based on a pre-defined grid of directions, and these direction values are used for the extraction of directional signals from the given HOA representation in the HOA compressor.
- the resulting indices of directional signals as well as the direction values are encoded in a particular manner.
- the inventive method is suited for encoding directional side information for a 3D audio signal, and includes the steps:
- the inventive apparatus is suited for encoding directional side information for a 3D audio signal, said apparatus including:
- the inventive method is suited for decoding directional side information for a 3D audio signal which directional side information was encoded according to the above encoding method, and includes the steps:
- the inventive apparatus is suited for decoding directional side information for a 3D audio signal, which directional side information was encoded according to the above encoding method, said apparatus including means being adapted for:
- Fig. 1 shows an HOA compression processing as described in patent application EP 13305558.2 , in which - following estimation of dominant sound source directions - a coding of directional side information is carried out.
- HOA representation compression a frame-wise processing with non-overlapping input frames C(k) of HOA coefficient sequences of length L is used, where k denotes the frame index.
- This long frame is 50% overlapped with an adjacent long frame and is successively used for the estimation of dominant sound source directions. If step/stage 11/12 is not present, the tilde symbol has no specific meaning.
- step or stage 13 dominant sound sources are estimated.
- the estimation provides a data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals that have been detected as well as the set of corresponding direction estimates.
- D denotes the maximum number of directional signals that has to be set before starting the HOA compression.
- step or stage 14 the current frame C ⁇ ( k ) of HOA coefficient sequences is decomposed into a number of directional signals X DIR ( k -2) belonging to the directions contained in the set and a residual ambient HOA component C AMB ( k -2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals.
- X DIR ( k - 2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero.
- the indices specifying these channels are assumed to be output in the data set I DIR,ACT ( k - 2).
- the decomposition in step/stage 14 provides some parameters ⁇ ( k -2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals.
- step or stage 15 the number of coefficients of the ambient HOA component C AMB ( k -2) is reduced so as to contain only O RED + D - N DIR , ACT ( k -2) non-zero HOA coefficient sequences, where N DIR,ACT ( k -2) -
- the final ambient HOA representation with the reduced number of O RED + N DIR,ACT ( k -2) non-zero coefficient sequences is denoted by C AMS,RED ( k -2).
- the indices of the chosen ambient HOA coefficient sequences are output in the data set I AMB,ACT ( k - 2).
- the active directional signals contained in X DIR ( k -2) and the HOA coefficient sequences contained in C AMB,RED ( k -2) are assigned to the frame Y ( k -2) of I channels for individual perceptual encoding.
- the data set ⁇ ⁇ 1,..., D ⁇ of indices of directional signals and the data set of corresponding direction value estimates from the estimation step/stage 13 are fed to a step or stage 18 that encodes the directional side information as described in the following.
- Step/stage 18 outputs a vector a ( k ) denoting which directional signals are active in frame k , as well as a coded representation of all directions.
- the values of can be entropy encoded.
- Step/stage 34 receives vector a ( k ) denoting which directional signals are active in frame k , and the coded representation of all directions. Step/stage 34 decodes as described below this directional side information and outputs the data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals and the set of corresponding direction estimates. In step or stage 31 a perceptual decoding of the I signals contained in is performed in order to obtain the I decoded signals in ⁇ ( k -2).
- the perceptually decoded signals in ⁇ ( k -2) are redistributed in order to recreate the frame X ⁇ DIR ( k -2) of directional signals and the frame ⁇ AMB,RED ( k -2) of the ambient HOA component.
- the information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets ⁇ DIR,ACT ( k) and I AMB,ACT ( k -2).
- composition step or stage 33 a current frame ⁇ ( k -3) of the desired total HOA representation is re-composed using the frame X ⁇ DIR ( k -2) of the directional signals, the set of the active directional signal indices together with the set of the corresponding directions, the parameters ⁇ ( k -2) for predicting portions of the HOA representation from the directional signals, and the frame ⁇ AMS,RED ( k -2) of HOA coefficient sequences of the reduced ambient HOA component.
- I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals ( X ⁇ DIR ( k -2)) using the received parameters ( ⁇ ( k -2)) for such prediction, and thereafter the current decompressed frame ( ⁇ ( k -3)) is re-composed from the frame of directional signals ( X ⁇ DIR ( k -2)), the predicted portions and the reduced ambient HOA component ( ⁇ AMB,RED ( k -2)).
- the directional HOA component for the k-th frame is represented by a number D ACT ( k ) of directional signals and additional side information.
- d 1 , ... , D ACT k of indices i ACT,d ( k ) of directional signals that have been detected.
- the side information consists of the set G ⁇ ⁇ , ACT k : ⁇ ACT , d k
- d 1 , ... , D ACT k of the corresponding directions ⁇ ACT, d ( k ).
- Fig. 4 illustrates an exemplary result of the direction estimation for the first 7 frames (cf. Fig. 3 and the corresponding description of the representation of a direction in a spherical coordinate system).
- the dots in Fig. 4 represent a grid of possible directions.
- the direction estimates related to the directional signal with index 1 are marked by diamonds and the direction estimates related to the directional signal with index 2 are marked by crosses.
- Fig. 5 This activity information is additionally illustrated in Fig. 5 , which shows for each frame index k if the direction with the respective index is active (indicated by white) or not (indicated by black).
- a ( k ) will contain e.g. 4 to 8 bits per frame.
- a current frame k will contain none, part or all of this set of pre-determined directional signals.
- the current vector a ( k ) is transferred to the decoder or decompression side.
- the coded direction value ⁇ ACT,1 ( k ) corresponds to the index indicated by the first non-zero element in a ( k )
- the coded direction value ⁇ ACT,2 ( k ) is assumed to correspond to the index indicated by the second non-zero element in a ( k ), etc.
- the coded representation of all direction values in the set are denoted by and is transferred to the decoder or decompression side.
- the disadvantage of such individual or specific quantisation processing is that likely such specifically quantised direction values will not exactly match with the pre-defined grid of directions: in order to not introduce errors when carrying out in the HOA decompressor the re-synthesis of the HOA representation of the directional signals due to direction quantisation errors, the extraction of directional signals from the given HOA representation in the HOA compressor in step/stage 13 in Fig. 1 is based on that pre-defined grid of directions.
- Patent applications EP 12306569.0 and EP 13305558.2 describe how directional signals can be extracted from an HOA representation.
- the problem that the directions of the quantised direction values do not exactly match with the estimated directions can be solved in a first embodiment by exploiting the fact that the splitting into directional and ambient residual components and the direction estimation described in patent applications EP 12306569.0 and EP 13305558.2 is based on a direction search which is carried out on a fixed grid of directions (cf. patent application EP 13305156.5 for a description of direction estimation as an example).
- a fixed grid represents the above-mentioned re-quantisation.
- the estimated direction ⁇ ACT, d ( k ) is an element of a set ⁇ ⁇ q
- q 1, ...,Q ⁇ of Q predefined directions.
- Such coding of the directions offers the further advantage that it is not recursive, meaning that no knowledge of the direction estimates from previous frames is required for the decoding of the directions.
- a disadvantage of such processing is that in general it does not achieve the related minimum possible average bit rate.
- the possibly entropy encoded representation of all directions is received, wherein these direction values were quantised according to said pre-defined grid, and vector a ( k ) is received that comprises the encoded indices about which directions from the set of pre-defined directions are present in the current audio signal frame C ( k ). If necessary, an entropy decoding takes place.
- the quantised direction values are re-quantised according to the pre-defined grid said, and vector a ( k ) is decoded.
- the average bit rate for the coding of the directions for successive frames is further reduced by exploiting the relation between the direction estimates of successive frames.
- the direction estimation as proposed in patent application EP 13305558.2 is based on an sound source movement model, which predicts the direction of a sound source in the k -th frame based on its movement between the ( k -2)-th and ( k -1)-th frame.
- the quantised direction values (e.g. the direction index as proposed in the first embodiment) of the k -th frame are in the second embodiment coded using entropy coding like e.g. Huffman coding.
- the individual code words for the direction values have a variable bit size depending on the frame adaptively determined probability of the individual directions.
- direction values with a high probability are coded using small-size code words and direction values with a low probability are coded using large-size code words.
- Such a coding strategy requires computation of the probability for the individual directions during HOA decompression in the same way as for the HOA compression.
- the received entropy encoded quantised direction values are entropy decoded wherein frame adaptively an probability of the individual directions is determined.
- this requires a high computational complexity in the HOA decompressor for computing the probabilities of the Q possible directions in each frame.
- the processing is recursive, meaning that the decoding of a direction at decompression side is based on the knowledge of the directions from the previous two frames.
- the number of possible probabilities is constrained to Q by making the probabilities dependent on the corresponding direction estimate in the last frame.
- conditional probabilities are set them inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame.
- Another possibility is to measure the conditional a-priori probabilities of the direction estimates from some HOA representations instead of setting them.
- the received entropy encoded quantised direction values are frame adaptively entropy decoded depending on an probability of the individual directions, whereby the number of possible probabilities is constrained to the number of directions in the pre-defined grid and the probabilities are dependent on the corresponding direction estimate in the last frame.
- an entropy decoding table e.g. a Huffman table
- the non-conditional probabilities of the quantised directions are employed.
- Such probabilities can be measured from some test HOA representations, or can be assigned according to expectations about typical HOA sound field representation. For example, high probabilities are assigned for directions in the front and low probabilities for directions in the back.
- Such a processing has the advantage of not being recursive, i.e. the decoding of a direction value is not based on the knowledge of any directions from previous frames.
- the efficiency of this kind of processing is likely lower than that of the third embodiment.
- the mode decision can be indicated by a Boolean variable which is prepended to the coded representation of a direction. Such a mode decision will in most cases minimise the bit amount of the corresponding code.
- inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
- the invention can be applied in any application where some directional information has to be efficiently coded, e.g. object based 3D audio where directional signals and object based side information have to be coded.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20130306391 EP2860728A1 (de) | 2013-10-09 | 2013-10-09 | Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20130306391 EP2860728A1 (de) | 2013-10-09 | 2013-10-09 | Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2860728A1 true EP2860728A1 (de) | 2015-04-15 |
Family
ID=49448078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20130306391 Withdrawn EP2860728A1 (de) | 2013-10-09 | 2013-10-09 | Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2860728A1 (de) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019197713A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Quantization of spatial audio parameters |
WO2020089509A1 (en) * | 2018-10-31 | 2020-05-07 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070269063A1 (en) * | 2006-05-17 | 2007-11-22 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
US20090083044A1 (en) * | 2006-03-15 | 2009-03-26 | France Telecom | Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal |
US20110249821A1 (en) * | 2008-12-15 | 2011-10-13 | France Telecom | encoding of multichannel digital audio signals |
-
2013
- 2013-10-09 EP EP20130306391 patent/EP2860728A1/de not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083044A1 (en) * | 2006-03-15 | 2009-03-26 | France Telecom | Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal |
US20070269063A1 (en) * | 2006-05-17 | 2007-11-22 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
US20110249821A1 (en) * | 2008-12-15 | 2011-10-13 | France Telecom | encoding of multichannel digital audio signals |
Non-Patent Citations (3)
Title |
---|
HIRVONEN TONI ET AL: "Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference", AES CONVENTION 126; MAY 2009, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 7 May 2009 (2009-05-07), XP040508988 * |
JING WANG ET AL: "Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate", EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, 21 May 2013 (2013-05-21), pages 1, XP055104567, Retrieved from the Internet <URL:http://asmp.eurasipjournals.com/content/pdf/1687-4722-2013-9.pdf> [retrieved on 20140226] * |
MANUEL BRIAND: "Codage paramétrique basé sur l'Analyse en Composante Principale", 20 March 2007 (2007-03-20), pages 133 - 147, XP002661237, Retrieved from the Internet <URL:http://tel.archives-ouvertes.fr/docs/00/14/18/62/PDF/2007_Briand_ParametricAudioCoding.pdf> [retrieved on 20101013] * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019197713A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Quantization of spatial audio parameters |
KR20200140874A (ko) * | 2018-04-09 | 2020-12-16 | 노키아 테크놀로지스 오와이 | 공간 오디오 파라미터의 양자화 |
EP3776545A4 (de) * | 2018-04-09 | 2022-01-05 | Nokia Technologies Oy | Quantisierung von räumlichen audioparametern |
US11475904B2 (en) | 2018-04-09 | 2022-10-18 | Nokia Technologies Oy | Quantization of spatial audio parameters |
WO2020089509A1 (en) * | 2018-10-31 | 2020-05-07 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11284210B2 (en) | Methods and apparatus for compressing and decompressing a higher order ambisonics representation | |
US11869523B2 (en) | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations | |
EP2860728A1 (de) | Verfahren und Vorrichtung zur Codierung und Decodierung gerichteter Nebeninformationen | |
US20040230423A1 (en) | Multiple channel mode decisions and encoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20131009 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20151016 |