EP2860728A1 - Method and apparatus for encoding and for decoding directional side information - Google Patents
Method and apparatus for encoding and for decoding directional side information Download PDFInfo
- Publication number
- EP2860728A1 EP2860728A1 EP20130306391 EP13306391A EP2860728A1 EP 2860728 A1 EP2860728 A1 EP 2860728A1 EP 20130306391 EP20130306391 EP 20130306391 EP 13306391 A EP13306391 A EP 13306391A EP 2860728 A1 EP2860728 A1 EP 2860728A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- directions
- audio signal
- direction values
- frame
- quantised
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims description 32
- 230000005236 sound signal Effects 0.000 claims abstract description 47
- 238000012545 processing Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims 1
- 230000006835 compression Effects 0.000 abstract description 13
- 238000007906 compression Methods 0.000 abstract description 13
- 230000006837 decompression Effects 0.000 description 9
- 238000013459 approach Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the invention relates to a method and to an apparatus for encoding and for decoding directional side information for a 3D audio signal.
- HOA Higher Order Ambisonics
- WFS wave field synthesis
- channel based approaches like 22.2.
- HOA Higher Order Ambisonics
- the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up.
- HOA may also be rendered to set-ups consisting of only few loudspeakers.
- a further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
- HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion.
- SH Spherical Harmonics
- Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function.
- O denotes the number of expansion coefficients.
- HOA coefficient sequences or as HOA channels in the following.
- An HOA representation can be expressed as a temporal sequence of HOA data frames containing HOA coefficients.
- the compression of HOA sound field representations is proposed in patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 : these approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional component and a residual ambient component.
- the resulting compressed representation comprises of a number of quantised signals, resulting from the perceptual coding of the directional signals and relevant coefficient sequences of the ambient HOA component.
- the resulting compressed representation comprises additional side information related to the quantised signals, which side information is necessary for the reconstruction of the HOA representation from its compressed version.
- a problem to be solved by the invention is to further improve the compression of HOA representations. This problem is solved by the methods disclosed in claims 1 and 10. Apparatuses utilising these methods are disclosed in claims 2 and 11.
- the invention deals with the coding of the side information related to the directional component, which additional compression is not addressed in the above-mentioned patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 .
- EP 12305537.8 in order to efficiently code or compress a given HOA representation, it is analysed on a frame-by-frame basis and is decomposed into a directional component and a residual ambient component, whereby at compressor side the direction values are estimated based on a pre-defined grid of directions, and these direction values are used for the extraction of directional signals from the given HOA representation in the HOA compressor.
- the resulting indices of directional signals as well as the direction values are encoded in a particular manner.
- the inventive method is suited for encoding directional side information for a 3D audio signal, and includes the steps:
- the inventive apparatus is suited for encoding directional side information for a 3D audio signal, said apparatus including:
- the inventive method is suited for decoding directional side information for a 3D audio signal which directional side information was encoded according to the above encoding method, and includes the steps:
- the inventive apparatus is suited for decoding directional side information for a 3D audio signal, which directional side information was encoded according to the above encoding method, said apparatus including means being adapted for:
- Fig. 1 shows an HOA compression processing as described in patent application EP 13305558.2 , in which - following estimation of dominant sound source directions - a coding of directional side information is carried out.
- HOA representation compression a frame-wise processing with non-overlapping input frames C(k) of HOA coefficient sequences of length L is used, where k denotes the frame index.
- This long frame is 50% overlapped with an adjacent long frame and is successively used for the estimation of dominant sound source directions. If step/stage 11/12 is not present, the tilde symbol has no specific meaning.
- step or stage 13 dominant sound sources are estimated.
- the estimation provides a data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals that have been detected as well as the set of corresponding direction estimates.
- D denotes the maximum number of directional signals that has to be set before starting the HOA compression.
- step or stage 14 the current frame C ⁇ ( k ) of HOA coefficient sequences is decomposed into a number of directional signals X DIR ( k -2) belonging to the directions contained in the set and a residual ambient HOA component C AMB ( k -2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals.
- X DIR ( k - 2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero.
- the indices specifying these channels are assumed to be output in the data set I DIR,ACT ( k - 2).
- the decomposition in step/stage 14 provides some parameters ⁇ ( k -2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals.
- step or stage 15 the number of coefficients of the ambient HOA component C AMB ( k -2) is reduced so as to contain only O RED + D - N DIR , ACT ( k -2) non-zero HOA coefficient sequences, where N DIR,ACT ( k -2) -
- the final ambient HOA representation with the reduced number of O RED + N DIR,ACT ( k -2) non-zero coefficient sequences is denoted by C AMS,RED ( k -2).
- the indices of the chosen ambient HOA coefficient sequences are output in the data set I AMB,ACT ( k - 2).
- the active directional signals contained in X DIR ( k -2) and the HOA coefficient sequences contained in C AMB,RED ( k -2) are assigned to the frame Y ( k -2) of I channels for individual perceptual encoding.
- the data set ⁇ ⁇ 1,..., D ⁇ of indices of directional signals and the data set of corresponding direction value estimates from the estimation step/stage 13 are fed to a step or stage 18 that encodes the directional side information as described in the following.
- Step/stage 18 outputs a vector a ( k ) denoting which directional signals are active in frame k , as well as a coded representation of all directions.
- the values of can be entropy encoded.
- Step/stage 34 receives vector a ( k ) denoting which directional signals are active in frame k , and the coded representation of all directions. Step/stage 34 decodes as described below this directional side information and outputs the data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals and the set of corresponding direction estimates. In step or stage 31 a perceptual decoding of the I signals contained in is performed in order to obtain the I decoded signals in ⁇ ( k -2).
- the perceptually decoded signals in ⁇ ( k -2) are redistributed in order to recreate the frame X ⁇ DIR ( k -2) of directional signals and the frame ⁇ AMB,RED ( k -2) of the ambient HOA component.
- the information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets ⁇ DIR,ACT ( k) and I AMB,ACT ( k -2).
- composition step or stage 33 a current frame ⁇ ( k -3) of the desired total HOA representation is re-composed using the frame X ⁇ DIR ( k -2) of the directional signals, the set of the active directional signal indices together with the set of the corresponding directions, the parameters ⁇ ( k -2) for predicting portions of the HOA representation from the directional signals, and the frame ⁇ AMS,RED ( k -2) of HOA coefficient sequences of the reduced ambient HOA component.
- I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals ( X ⁇ DIR ( k -2)) using the received parameters ( ⁇ ( k -2)) for such prediction, and thereafter the current decompressed frame ( ⁇ ( k -3)) is re-composed from the frame of directional signals ( X ⁇ DIR ( k -2)), the predicted portions and the reduced ambient HOA component ( ⁇ AMB,RED ( k -2)).
- the directional HOA component for the k-th frame is represented by a number D ACT ( k ) of directional signals and additional side information.
- d 1 , ... , D ACT k of indices i ACT,d ( k ) of directional signals that have been detected.
- the side information consists of the set G ⁇ ⁇ , ACT k : ⁇ ACT , d k
- d 1 , ... , D ACT k of the corresponding directions ⁇ ACT, d ( k ).
- Fig. 4 illustrates an exemplary result of the direction estimation for the first 7 frames (cf. Fig. 3 and the corresponding description of the representation of a direction in a spherical coordinate system).
- the dots in Fig. 4 represent a grid of possible directions.
- the direction estimates related to the directional signal with index 1 are marked by diamonds and the direction estimates related to the directional signal with index 2 are marked by crosses.
- Fig. 5 This activity information is additionally illustrated in Fig. 5 , which shows for each frame index k if the direction with the respective index is active (indicated by white) or not (indicated by black).
- a ( k ) will contain e.g. 4 to 8 bits per frame.
- a current frame k will contain none, part or all of this set of pre-determined directional signals.
- the current vector a ( k ) is transferred to the decoder or decompression side.
- the coded direction value ⁇ ACT,1 ( k ) corresponds to the index indicated by the first non-zero element in a ( k )
- the coded direction value ⁇ ACT,2 ( k ) is assumed to correspond to the index indicated by the second non-zero element in a ( k ), etc.
- the coded representation of all direction values in the set are denoted by and is transferred to the decoder or decompression side.
- the disadvantage of such individual or specific quantisation processing is that likely such specifically quantised direction values will not exactly match with the pre-defined grid of directions: in order to not introduce errors when carrying out in the HOA decompressor the re-synthesis of the HOA representation of the directional signals due to direction quantisation errors, the extraction of directional signals from the given HOA representation in the HOA compressor in step/stage 13 in Fig. 1 is based on that pre-defined grid of directions.
- Patent applications EP 12306569.0 and EP 13305558.2 describe how directional signals can be extracted from an HOA representation.
- the problem that the directions of the quantised direction values do not exactly match with the estimated directions can be solved in a first embodiment by exploiting the fact that the splitting into directional and ambient residual components and the direction estimation described in patent applications EP 12306569.0 and EP 13305558.2 is based on a direction search which is carried out on a fixed grid of directions (cf. patent application EP 13305156.5 for a description of direction estimation as an example).
- a fixed grid represents the above-mentioned re-quantisation.
- the estimated direction ⁇ ACT, d ( k ) is an element of a set ⁇ ⁇ q
- q 1, ...,Q ⁇ of Q predefined directions.
- Such coding of the directions offers the further advantage that it is not recursive, meaning that no knowledge of the direction estimates from previous frames is required for the decoding of the directions.
- a disadvantage of such processing is that in general it does not achieve the related minimum possible average bit rate.
- the possibly entropy encoded representation of all directions is received, wherein these direction values were quantised according to said pre-defined grid, and vector a ( k ) is received that comprises the encoded indices about which directions from the set of pre-defined directions are present in the current audio signal frame C ( k ). If necessary, an entropy decoding takes place.
- the quantised direction values are re-quantised according to the pre-defined grid said, and vector a ( k ) is decoded.
- the average bit rate for the coding of the directions for successive frames is further reduced by exploiting the relation between the direction estimates of successive frames.
- the direction estimation as proposed in patent application EP 13305558.2 is based on an sound source movement model, which predicts the direction of a sound source in the k -th frame based on its movement between the ( k -2)-th and ( k -1)-th frame.
- the quantised direction values (e.g. the direction index as proposed in the first embodiment) of the k -th frame are in the second embodiment coded using entropy coding like e.g. Huffman coding.
- the individual code words for the direction values have a variable bit size depending on the frame adaptively determined probability of the individual directions.
- direction values with a high probability are coded using small-size code words and direction values with a low probability are coded using large-size code words.
- Such a coding strategy requires computation of the probability for the individual directions during HOA decompression in the same way as for the HOA compression.
- the received entropy encoded quantised direction values are entropy decoded wherein frame adaptively an probability of the individual directions is determined.
- this requires a high computational complexity in the HOA decompressor for computing the probabilities of the Q possible directions in each frame.
- the processing is recursive, meaning that the decoding of a direction at decompression side is based on the knowledge of the directions from the previous two frames.
- the number of possible probabilities is constrained to Q by making the probabilities dependent on the corresponding direction estimate in the last frame.
- conditional probabilities are set them inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame.
- Another possibility is to measure the conditional a-priori probabilities of the direction estimates from some HOA representations instead of setting them.
- the received entropy encoded quantised direction values are frame adaptively entropy decoded depending on an probability of the individual directions, whereby the number of possible probabilities is constrained to the number of directions in the pre-defined grid and the probabilities are dependent on the corresponding direction estimate in the last frame.
- an entropy decoding table e.g. a Huffman table
- the non-conditional probabilities of the quantised directions are employed.
- Such probabilities can be measured from some test HOA representations, or can be assigned according to expectations about typical HOA sound field representation. For example, high probabilities are assigned for directions in the front and low probabilities for directions in the back.
- Such a processing has the advantage of not being recursive, i.e. the decoding of a direction value is not based on the knowledge of any directions from previous frames.
- the efficiency of this kind of processing is likely lower than that of the third embodiment.
- the mode decision can be indicated by a Boolean variable which is prepended to the coded representation of a direction. Such a mode decision will in most cases minimise the bit amount of the corresponding code.
- inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
- the invention can be applied in any application where some directional information has to be efficiently coded, e.g. object based 3D audio where directional signals and object based side information have to be coded.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
A Higher Order Ambisonics (HOA) representation causes a high data rate. Thus, compression of HOA representations is desirable by decomposing the HOA representation into a directional component and a residual ambient component. The directional component requires directional side information. The overall compression can be improved by coding (18) that directional side information using a specific quantisation of dominant signal direction values ( (k)), and by establishing a vector (a(k)) that defines which directions from a set of pre-defined directions are present in a current audio signal frame (C(k)).
Description
- The invention relates to a method and to an apparatus for encoding and for decoding directional side information for a 3D audio signal.
- Higher Order Ambisonics (HOA) represents three-dimensional sound. Other techniques are wave field synthesis (WFS) or channel based approaches like 22.2. In contrast to channel based methods, however, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
- HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels in the following. An HOA representation can be expressed as a temporal sequence of HOA data frames containing HOA coefficients.
- The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quad-ratically with the order N, in particular O=(N+1)2 . For example, typical HOA representations using order N =4 require O=25 HOA (expansion) coefficients. Accordingly, a total bit rate for the transmission of an HOA representation, given a desired single-channel sampling rate f S and the number of bits N b per sample, is determined by O·f S·N b. Consequently, transmitting an HOA representation of order N =4 with a sampling rate of f S=48kHz employing N b=16 bits per sample results in a bit rate of 19.2MBits/s, which is very high for many practical applications like e.g. streaming.
- Thus, compression of HOA representations is highly desirable. The compression of HOA sound field representations is proposed in patent applications
EP 12305537.8 EP 12306569.0 EP 13305558.2 - A problem to be solved by the invention is to further improve the compression of HOA representations. This problem is solved by the methods disclosed in
claims 1 and 10. Apparatuses utilising these methods are disclosed inclaims - The invention deals with the coding of the side information related to the directional component, which additional compression is not addressed in the above-mentioned patent applications
EP 12305537.8 EP 12306569.0 EP 13305558.2 - In principle, the inventive method is suited for encoding directional side information for a 3D audio signal, and includes the steps:
- receiving a data set of dominant signal direction values for a current audio signal frame and a data set of indices of corresponding directional signals, wherein the dominant signal directions were estimated from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction of said directional signals from said 3D audio signal;
- encoding said directional side information for said current audio signal frame by quantising, using said pre-defined grid, the direction values in said received data set of dominant signal directions, and by establishing a vector that defines which directions from a set of pre-defined directions are present in said current audio signal frame.
- In principle the inventive apparatus is suited for encoding directional side information for a 3D audio signal, said apparatus including:
- means being adapted for encoding said directional side information for a current audio signal frame, which means receive a data set of dominant signal direction values for said current audio signal frame and a data set of indices of corresponding directional signals, wherein the dominant signal directions were estimated from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction of said directional signals from said 3D audio signal, and which means quantise, using said pre-defined grid, the direction values in said received data set of dominant signal directions, and establish a vector that defines which directions from a set of pre-defined directions are present in said current audio signal frame.
- In principle the inventive method is suited for decoding directional side information for a 3D audio signal which directional side information was encoded according to the above encoding method, and includes the steps:
- receiving for a current audio signal frame direction values quantised according to said pre-defined grid and a vector that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame;
- re-quantising according to said pre-defined grid said quantised direction values, and decoding said vector;
- providing from said re-quantised direction values and said decoded vector a data set of dominant signal direction values for said current audio signal frame and a data set of indices of corresponding directional signals.
- In principle the inventive apparatus is suited for decoding directional side information for a 3D audio signal, which directional side information was encoded according to the above encoding method, said apparatus including means being adapted for:
- receiving for a current audio signal frame direction values quantised according to said pre-defined grid and a vector that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame;
- re-quantising according to said pre-defined grid said quantised direction values, and decoding said vector;
- providing from said re-quantised direction values and said decoded vector a data set of dominant signal direction values for said current audio signal frame and a data set of indices of corresponding directional signals.
- Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
- Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
- Fig. 1
- Block diagram for an HOA compression including the encoding of directional side information;
- Fig. 2
- Block diagram for an HOA decompression including the decoding of directional side information;
- Fig. 3
- Spherical coordinate system;
- Fig. 4
- Exemplary illustration of direction estimation;
- Fig. 5
- Activity diagram related to
Fig. 4 . - In order to compress a given HOA representation, it is analysed on a frame-by-frame basis and decomposed into a directional component and a residual ambient component, for example as described in patent applications
EP 12305537.8 EP 12306569.0 EP 13305558.2 - As an example for embedding directional side information coding according to the invention in an HOA processing that uses splitting into directional and residual ambient components,
Fig. 1 shows an HOA compression processing as described in patent applicationEP 13305558.2 stage 11/12 inFig. 1 is optional and consists of concatenating the non-overlapping k -th and (k-1)-th frames of HOA coefficient sequences into a long frame C̃(k) as C̃(k):=[C(k-1) C(k)] (the tilde symbol indicates long overlapping frames). This long frame is 50% overlapped with an adjacent long frame and is successively used for the estimation of dominant sound source directions. If step/stage 11/12 is not present, the tilde symbol has no specific meaning. - In step or
stage 13 dominant sound sources are estimated. The estimation provides a data set ⊆ {1, ...,D} of indices of directional signals that have been detected as well as the set of corresponding direction estimates. D denotes the maximum number of directional signals that has to be set before starting the HOA compression. In step orstage 14, the current frame C̃(k) of HOA coefficient sequences is decomposed into a number of directional signals X DIR(k-2) belonging to the directions contained in the set and a residual ambient HOA component C AMB (k-2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals. It is assumed that X DIR(k - 2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero. The indices specifying these channels are assumed to be output in the data set I DIR,ACT(k - 2). Additionally, the decomposition in step/stage 14 provides some parameters ζ(k-2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals. In step orstage 15, the number of coefficients of the ambient HOA component C AMB(k-2) is reduced so as to contain only O RED + D - N DIR,ACT(k-2) non-zero HOA coefficient sequences, where N DIR,ACT(k-2) - |I DIR,ACT(k-2)| indicates the cardinality of the data set I DIR,ACT(k-2), i.e. the number of active directional signals in frame k-2. Since the ambient HOA component is assumed to be always represented by a minimum number O RED of HOA coefficient sequences, this problem can be actually reduced to the selection of the remaining D-N DIR,ACT(k-2) HOA coefficient sequences out of the possible O-O RED ones. In order to obtain a smooth reduced ambient HOA representation, this choice is accomplished such that, compared to the choice taken at the previous frame k-3, as few changes as possible will occur. - The final ambient HOA representation with the reduced number of O RED+N DIR,ACT(k-2) non-zero coefficient sequences is denoted by C AMS,RED(k-2). The indices of the chosen ambient HOA coefficient sequences are output in the data set I AMB,ACT(k - 2). In step/
stage 16, the active directional signals contained in X DIR(k-2) and the HOA coefficient sequences contained in C AMB,RED(k-2) are assigned to the frame Y (k-2) of I channels for individual perceptual encoding. - According to the present invention, the data set ⊆ {1,...,D} of indices of directional signals and the data set of corresponding direction value estimates from the estimation step/
stage 13 are fed to a step orstage 18 that encodes the directional side information as described in the following. Step/stage 18 outputs a vector a (k) denoting which directional signals are active in frame k , as well as a coded representation of all directions. The values of can be entropy encoded. - The HOA decompression processing described in patent application
EP 13305558.2 stage 34 for decoding the received encoded directional side information, is depicted inFig. 2 . Step/stage 34 receives vector a (k) denoting which directional signals are active in frame k, and the coded representation of all directions. Step/stage 34 decodes as described below this directional side information and outputs the data set ⊆ {1, ...,D} of indices of directional signals and the set of corresponding direction estimates.
In step or stage 31 a perceptual decoding of the I signals contained in is performed in order to obtain the I decoded signals in Ŷ (k-2). In signal re-distributing step orstage 32, the perceptually decoded signals in Ŷ (k-2) are redistributed in order to recreate the frame X̂ DIR(k-2) of directional signals and the frame Ĉ AMB,RED(k-2) of the ambient HOA component. The information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets Ĩ DIR,ACT(k) and I AMB,ACT(k-2). - In composition step or
stage 33, a current frame Ĉ(k-3) of the desired total HOA representation is re-composed using the frame X̂ DIR(k-2) of the directional signals, the set of the active directional signal indices together with the set of the corresponding directions, the parameters ζ(k-2) for predicting portions of the HOA representation from the directional signals, and the frame Ĉ AMS,RED(k-2) of HOA coefficient sequences of the reduced ambient HOA component. I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals (X̂ DIR(k-2)) using the received parameters (ζ(k-2)) for such prediction, and thereafter the current decompressed frame (Ĉ(k-3)) is re-composed from the frame of directional signals ( X̂ DIR(k-2)), the predicted portions and the reduced ambient HOA component (Ĉ AMB,RED(k-2)). - As mentioned above, in patent application
EP 13305558.2 - To illustrate the meaning of the side information by way of an example, the case is considered where the maximum number D of directional signals is equal to two.
Fig. 4 illustrates an exemplary result of the direction estimation for the first 7 frames (cf.Fig. 3 and the corresponding description of the representation of a direction in a spherical coordinate system). The dots inFig. 4 represent a grid of possible directions. The direction estimates related to the directional signal withindex 1 are marked by diamonds and the direction estimates related to the directional signal withindex 2 are marked by crosses. The directional signal withindex 1 representing a first trajectory is supposed to be active from frame k = 1 to k = 4, whereas the directional signal withindex 2 representing a second trajectory is supposed to be active from frame k = 3 to k = 7. - This activity information is additionally illustrated in
Fig. 5 , which shows for each frame index k if the direction with the respective index is active (indicated by white) or not (indicated by black). The resulting index sets k = 1,2,...,7 corresponding toFig. 4 are summarised in Table 1:k 1 2 3 4 5 6 7 Ĩ DIR,ACT(k) {1} {1} {1,2} {1,2} {2} {2} {2} - Because the indices of active directional signals correspond to the indices of D channels to which these directional signals are assigned, for coding the indices of the directional signals a bit array of length D is used that is represented by the vector
- There are exactly D ACT (k) non-zero elements in vector a (k), which corresponds to the number of active directional signals.
- For the compression of 3D audio signals it is reasonable to assume not more than four to eight directional signals in a frame, and therefore a(k) will contain e.g. 4 to 8 bits per frame. A current frame k will contain none, part or all of this set of pre-determined directional signals. For a current frame k the current vector a (k) is transferred to the decoder or decompression side.
-
- The direction values used for a frame may vary from frame to frame. Assuming that the indices i ACT,d (k), d = 1,...,D ACT(k) are ordered in an ascending order, it is sufficient to code the direction values Ω ACT,d(k), d = 1,...,D ACT(k) one after the other in order to be able to unambiguously link them to the indices. In other words, given the vector a (k) and the sequence of coded directions, it can be assumed that the coded direction value Ω ACT,1(k) corresponds to the index indicated by the first non-zero element in a (k), and the coded direction value Ω ACT,2(k) is assumed to correspond to the index indicated by the second non-zero element in a (k), etc. As mentioned above, for frame k the coded representation of all direction values in the set are denoted by and is transferred to the decoder or decompression side.
- In the following, the problem of how to efficiently encode the direction values Ω ACT,d (k), d = 1,...,D ACT(k) for generating is addressed. In principle, assuming a spherical coordinate system as shown in
Fig. 3 , each direction Ω ACT,d (k) can be unambiguously represented by the tuple - On one hand, the inclination and azimuth angles could be quantised individually, in particular by assuming Mθ = 2Q
θ possible discrete values for the inclination angle and Mφ = 2 Qφ possible discrete values for the azimuth angle, resulting in a total number of Qθ + Qφ bits required for the coding of a single direction. On the other hand, the disadvantage of such individual or specific quantisation processing is that likely such specifically quantised direction values will not exactly match with the pre-defined grid of directions: in order to not introduce errors when carrying out in the HOA decompressor the re-synthesis of the HOA representation of the directional signals due to direction quantisation errors, the extraction of directional signals from the given HOA representation in the HOA compressor in step/stage 13 inFig. 1 is based on that pre-defined grid of directions. Patent applicationsEP 12306569.0 EP 13305558.2 - The problem that the directions of the quantised direction values do not exactly match with the estimated directions can be solved in a first embodiment by exploiting the fact that the splitting into directional and ambient residual components and the direction estimation described in patent applications
EP 12306569.0 EP 13305558.2 EP 13305156.5 stage 18 can be quantised according to this pre-defined grid, by representing a direction by the index q ∈ {1,...,Q}. Then, a quantised representation of a single direction will require ┌log2(Q)┐ bits. For instance, using a grid consisting of Q=900 predefined directions would require 10 bits for a corresponding quantisation. - Such coding of the directions offers the further advantage that it is not recursive, meaning that no knowledge of the direction estimates from previous frames is required for the decoding of the directions. However, a disadvantage of such processing is that in general it does not achieve the related minimum possible average bit rate.
- At decompressor side, for a current audio signal frame C(k), the possibly entropy encoded representation of all directions is received, wherein these direction values were quantised according to said pre-defined grid, and vector a (k) is received that comprises the encoded indices about which directions from the set of pre-defined directions are present in the current audio signal frame C(k). If necessary, an entropy decoding takes place. The quantised direction values are re-quantised according to the pre-defined grid said, and vector a (k) is decoded. From the re-quantised direction values and the decoded vector a data set of dominant signal direction values for said current audio signal frame C(k) and a data set Ĩ DIR,ACT(k) of indices of corresponding directional signals is provided.
- In a second embodiment, the average bit rate for the coding of the directions for successive frames is further reduced by exploiting the relation between the direction estimates of successive frames. In particular, the direction estimation as proposed in patent application
EP 13305558.2 - Therefore the quantised direction values (e.g. the direction index as proposed in the first embodiment) of the k -th frame are in the second embodiment coded using entropy coding like e.g. Huffman coding. The individual code words for the direction values have a variable bit size depending on the frame adaptively determined probability of the individual directions. In particular, direction values with a high probability are coded using small-size code words and direction values with a low probability are coded using large-size code words.
- Such a coding strategy requires computation of the probability for the individual directions during HOA decompression in the same way as for the HOA compression. At decompression side, the received entropy encoded quantised direction values are entropy decoded wherein frame adaptively an probability of the individual directions is determined. However, this requires a high computational complexity in the HOA decompressor for computing the probabilities of the Q possible directions in each frame. Further, the processing is recursive, meaning that the decoding of a direction at decompression side is based on the knowledge of the directions from the previous two frames.
- In a third embodiment, in order to reduce the computational complexity introduced by evaluating frame-by-frame the a-priori probabilities for all Q possible directions in the HOA decompressor as described for the second embodiment, the number of possible probabilities is constrained to Q by making the probabilities dependent on the corresponding direction estimate in the last frame.
- One possibility to define such conditional probabilities is to set them inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame. Another possibility is to measure the conditional a-priori probabilities of the direction estimates from some HOA representations instead of setting them.
- At decompression side, the received entropy encoded quantised direction values are frame adaptively entropy decoded depending on an probability of the individual directions, whereby the number of possible probabilities is constrained to the number of directions in the pre-defined grid and the probabilities are dependent on the corresponding direction estimate in the last frame. Such a technique requires that the HOA decompressor holds for each of the Q possible test directions an entropy decoding table (e.g. a Huffman table) containing Q code words and respective indices of the temporally following directions.
- In a fourth embodiment, instead of considering the conditioned probabilities for the entropy coding, the non-conditional probabilities of the quantised directions are employed. Such probabilities can be measured from some test HOA representations, or can be assigned according to expectations about typical HOA sound field representation. For example, high probabilities are assigned for directions in the front and low probabilities for directions in the back.
- Such a processing has the advantage of not being recursive, i.e. the decoding of a direction value is not based on the knowledge of any directions from previous frames. However, due to the use of non-conditional probabilities, in general the efficiency of this kind of processing is likely lower than that of the third embodiment.
- In a fifth embodiment, in order to alleviate computational load and extensive storage requirements for the HOA decompressor, for each frame it is decided which one of the above embodiments is used, resulting in a set of four (or less) modes:
- processing according to the first embodiment;
- processing according to a combination of the first embodiment and the second embodiment;
- processing according to a combination of the first embodiment and the third embodiment;
- processing according to a combination of the first embodiment and the fourth embodiment.
- The mode decision can be indicated by a Boolean variable which is prepended to the coded representation of a direction. Such a mode decision will in most cases minimise the bit amount of the corresponding code.
- The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
- The invention can be applied in any application where some directional information has to be efficiently coded, e.g. object based 3D audio where directional signals and object based side information have to be coded.
Claims (16)
- Method for encoding directional side information for a 3D audio signal, characterised by the steps:- receiving a data set of dominant signal direction values for a current audio signal frame (C(k)) and a data set (Ĩ DIR,ACT(k)) of indices of corresponding directional signals, wherein the dominant signal directions were estimated (13) from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction (13) of said directional signals from said 3D audio signal;- encoding (18) said directional side information for said current audio signal frame (C(k)) by quantising, using said pre-defined grid, the direction values in said received data set of dominant signal directions, and by establishing a vector ( a (k)) that defines which directions from a set of pre-defined directions are present in said current audio signal frame ( C (k)).
- Apparatus for encoding directional side information for a 3D audio signal, said apparatus including:means (18) being adapted for encoding (18) said directional side information for a current audio signal frame ( C (k)), which means receive a data set (G̃ Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ DIR,ACT(k)) of indices of corresponding directional signals, wherein the dominant signal directions were estimated (13) from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction (13) of said directional signals from said 3D audio signal,
- Method according to claim 1, or apparatus according to claim 2, wherein frame adaptively an probability of the individual directions is determined based on the knowledge of the directions from the previous two frames, and wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on said probability.
- Method according to claim 1, or apparatus according to claim 2, wherein frame adaptively an probability of the individual directions is determined whereby the number of possible probabilities is constrained to the number of directions in said pre-defined grid and said probabilities are dependent on the corresponding direction estimate in the last frame, and wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on said probability.
- Method according to the method of claim 4, or apparatus according to the apparatus of claim 4, wherein said conditional probabilities are inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame.
- Method according to the method of claim 4, or apparatus according to the apparatus of claim 4, wherein said conditional a-priori probabilities are measured from several HOA representations.
- Method according to claim 1, or apparatus according to claim 2, wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on non-conditional a-priori probabilities.
- Method according to the method of one of claims 1, 3, 4 or 7, or apparatus according to the apparatus of one of claims 2, 3, 4 or 7, wherein for each 3D audio signal frame (C(k)) it is decided which one of the processings according to claims 1, 3, 4 or 7 is carried out, and a corresponding mode code word representing said selection is provided.
- Method according to the method of one of claims 1 and 3 to 8, or apparatus according to the apparatus of one of claims 2 to 8, wherein said 3D audio signal is an HOA audio signal.
- Computer program product comprising instructions which, when carried out on a computer, perform the method according to one of claims 1 and 3 to 9.
- Method for decoding directional side information for a 3D audio signal, which directional side information was encoded according to claim 1, characterised by the steps:- receiving for a current audio signal frame (C(k)) direction values quantised according to said pre-defined grid and a vector ( a (k)) that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame ( C (k));- re-quantising (34) according to said pre-defined grid said quantised direction values and decoding (34) said vector ( a (k));- providing (34) from said re-quantised direction values and said decoded vector a data set (G̃ Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ DIR,ACT(k)) of indices of corresponding directional signals.
- Apparatus for decoding directional side information for a 3D audio signal, which directional side information was encoded according to claim 1, said apparatus including means (34) being adapted for:- receiving for a current audio signal frame (C(k)) direction values quantised according to said pre-defined grid and a vector ( a (k)) that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame ( C (k));- re-quantising according to said pre-defined grid said quantised direction values and decoding said vector ( a (k));- providing from said re-quantised direction values and said decoded vector a data set (G̃ Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ DIR,ACT(k)) of indices of corresponding directional signals.
- Method according to claim 11, or apparatus according to claim 12, wherein said received quantised direction values are entropy encoded quantised direction values and frame adaptively an a-priori probability of the individual directions is determined based on the knowledge of the directions from the previous two frames, and wherein said entropy encoded quantised direction values are entropy decoded depending on said a-priori probability.
- Method according to claim 11, or apparatus according to claim 12, wherein said received quantised direction values are entropy encoded quantised direction values and frame adaptively an a-priori probability of the individual directions is determined whereby the number of possible a-priori probabilities is constrained to the number of directions in said pre-defined grid and said a-priori probabilities are dependent on the corresponding direction estimate in the last frame, and wherein said entropy encoded quantised direction values are entropy decoded with variable bit size code words depending on said a-priori probability.
- Method according to the method of one of claims 11, 13 and 14, or apparatus according to the apparatus of one of claims 12 to 14, wherein said 3D audio signal is an HOA audio signal.
- Method according to the method of claims 11 and 14, or apparatus according to the apparatus of claims 12 and 14, wherein said decoding of directional side information is carried out in an HOA decompressor and this HOA decompressor holds for each possible direction of said pre-defined grid an entropy decoding table containing a corresponding code word and respective indices of the temporally following directions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20130306391 EP2860728A1 (en) | 2013-10-09 | 2013-10-09 | Method and apparatus for encoding and for decoding directional side information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20130306391 EP2860728A1 (en) | 2013-10-09 | 2013-10-09 | Method and apparatus for encoding and for decoding directional side information |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2860728A1 true EP2860728A1 (en) | 2015-04-15 |
Family
ID=49448078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20130306391 Withdrawn EP2860728A1 (en) | 2013-10-09 | 2013-10-09 | Method and apparatus for encoding and for decoding directional side information |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2860728A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019197713A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Quantization of spatial audio parameters |
WO2020089509A1 (en) * | 2018-10-31 | 2020-05-07 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070269063A1 (en) * | 2006-05-17 | 2007-11-22 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
US20090083044A1 (en) * | 2006-03-15 | 2009-03-26 | France Telecom | Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal |
US20110249821A1 (en) * | 2008-12-15 | 2011-10-13 | France Telecom | encoding of multichannel digital audio signals |
-
2013
- 2013-10-09 EP EP20130306391 patent/EP2860728A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083044A1 (en) * | 2006-03-15 | 2009-03-26 | France Telecom | Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal |
US20070269063A1 (en) * | 2006-05-17 | 2007-11-22 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
US20110249821A1 (en) * | 2008-12-15 | 2011-10-13 | France Telecom | encoding of multichannel digital audio signals |
Non-Patent Citations (3)
Title |
---|
HIRVONEN TONI ET AL: "Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference", AES CONVENTION 126; MAY 2009, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 7 May 2009 (2009-05-07), XP040508988 * |
JING WANG ET AL: "Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate", EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, 21 May 2013 (2013-05-21), pages 1, XP055104567, Retrieved from the Internet <URL:http://asmp.eurasipjournals.com/content/pdf/1687-4722-2013-9.pdf> [retrieved on 20140226] * |
MANUEL BRIAND: "Codage paramétrique basé sur l'Analyse en Composante Principale", 20 March 2007 (2007-03-20), pages 133 - 147, XP002661237, Retrieved from the Internet <URL:http://tel.archives-ouvertes.fr/docs/00/14/18/62/PDF/2007_Briand_ParametricAudioCoding.pdf> [retrieved on 20101013] * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019197713A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Quantization of spatial audio parameters |
KR20200140874A (en) * | 2018-04-09 | 2020-12-16 | 노키아 테크놀로지스 오와이 | Quantization of spatial audio parameters |
EP3776545A4 (en) * | 2018-04-09 | 2022-01-05 | Nokia Technologies Oy | Quantization of spatial audio parameters |
US11475904B2 (en) | 2018-04-09 | 2022-10-18 | Nokia Technologies Oy | Quantization of spatial audio parameters |
WO2020089509A1 (en) * | 2018-10-31 | 2020-05-07 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11284210B2 (en) | Methods and apparatus for compressing and decompressing a higher order ambisonics representation | |
CN112233684A (en) | Apparatus and method for encoding or decoding multi-channel signal | |
US11869523B2 (en) | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations | |
EP2860728A1 (en) | Method and apparatus for encoding and for decoding directional side information | |
US20040230423A1 (en) | Multiple channel mode decisions and encoding | |
CN118248156A (en) | Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20131009 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20151016 |