EP2860728A1

EP2860728A1 - Method and apparatus for encoding and for decoding directional side information

Info

Publication number: EP2860728A1
Application number: EP20130306391
Authority: EP
Inventors: Alexander Krüger; Sven Kordon; Oliver Wuebbolt
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2013-10-09
Filing date: 2013-10-09
Publication date: 2015-04-15

Abstract

A Higher Order Ambisonics (HOA) representation causes a high data rate. Thus, compression of HOA representations is desirable by decomposing the HOA representation into a directional component and a residual ambient component. The directional component requires directional side information. The overall compression can be improved by coding (18) that directional side information using a specific quantisation of dominant signal direction values (

(k)), and by establishing a vector (a(k)) that defines which directions from a set of pre-defined directions are present in a current audio signal frame (C(k)).

Description

Technical field

The invention relates to a method and to an apparatus for encoding and for decoding directional side information for a 3D audio signal.

Background

Higher Order Ambisonics (HOA) represents three-dimensional sound. Other techniques are wave field synthesis (WFS) or channel based approaches like 22.2. In contrast to channel based methods, however, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels in the following. An HOA representation can be expressed as a temporal sequence of HOA data frames containing HOA coefficients.
The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quad-ratically with the order N, in particular O=(N+1)² . For example, typical HOA representations using order N =4 require O=25 HOA (expansion) coefficients. Accordingly, a total bit rate for the transmission of an HOA representation, given a desired single-channel sampling rate f _S and the number of bits N _b per sample, is determined by O·f _S·N _b. Consequently, transmitting an HOA representation of order N =4 with a sampling rate of f _S=48kHz employing N _b=16 bits per sample results in a bit rate of 19.2MBits/s, which is very high for many practical applications like e.g. streaming.

Summary of invention

Thus, compression of HOA representations is highly desirable. The compression of HOA sound field representations is proposed in patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 : these approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional component and a residual ambient component. On one hand, the resulting compressed representation comprises of a number of quantised signals, resulting from the perceptual coding of the directional signals and relevant coefficient sequences of the ambient HOA component. On the other hand the resulting compressed representation comprises additional side information related to the quantised signals, which side information is necessary for the reconstruction of the HOA representation from its compressed version.
A problem to be solved by the invention is to further improve the compression of HOA representations. This problem is solved by the methods disclosed in claims 1 and 10. Apparatuses utilising these methods are disclosed in claims 2 and 11.
The invention deals with the coding of the side information related to the directional component, which additional compression is not addressed in the above-mentioned patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 . In this prior art, in order to efficiently code or compress a given HOA representation, it is analysed on a frame-by-frame basis and is decomposed into a directional component and a residual ambient component, whereby at compressor side the direction values are estimated based on a pre-defined grid of directions, and these direction values are used for the extraction of directional signals from the given HOA representation in the HOA compressor. According to the invention, the resulting indices of directional signals as well as the direction values are encoded in a particular manner.
In principle, the inventive method is suited for encoding directional side information for a 3D audio signal, and includes the steps:

receiving a data set of dominant signal direction values for a current audio signal frame and a data set of indices of corresponding directional signals, wherein the dominant signal directions were estimated from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction of said directional signals from said 3D audio signal;
encoding said directional side information for said current audio signal frame by quantising, using said pre-defined grid, the direction values in said received data set of dominant signal directions, and by establishing a vector that defines which directions from a set of pre-defined directions are present in said current audio signal frame.

In principle the inventive apparatus is suited for encoding directional side information for a 3D audio signal, said apparatus including:

means being adapted for encoding said directional side information for a current audio signal frame, which means receive a data set of dominant signal direction values for said current audio signal frame and a data set of indices of corresponding directional signals, wherein the dominant signal directions were estimated from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction of said directional signals from said 3D audio signal, and which means quantise, using said pre-defined grid, the direction values in said received data set of dominant signal directions, and establish a vector that defines which directions from a set of pre-defined directions are present in said current audio signal frame.

In principle the inventive method is suited for decoding directional side information for a 3D audio signal which directional side information was encoded according to the above encoding method, and includes the steps:

receiving for a current audio signal frame direction values quantised according to said pre-defined grid and a vector that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame;
re-quantising according to said pre-defined grid said quantised direction values, and decoding said vector;
providing from said re-quantised direction values and said decoded vector a data set of dominant signal direction values for said current audio signal frame and a data set of indices of corresponding directional signals.

In principle the inventive apparatus is suited for decoding directional side information for a 3D audio signal, which directional side information was encoded according to the above encoding method, said apparatus including means being adapted for:

Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.

Brief description of drawings

Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:

Fig. 1: Block diagram for an HOA compression including the encoding of directional side information;
Fig. 2: Block diagram for an HOA decompression including the decoding of directional side information;
Fig. 3: Spherical coordinate system;
Fig. 4: Exemplary illustration of direction estimation;
Fig. 5: Activity diagram related to Fig. 4.

Description of embodiments

In order to compress a given HOA representation, it is analysed on a frame-by-frame basis and decomposed into a directional component and a residual ambient component, for example as described in patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 .
As an example for embedding directional side information coding according to the invention in an HOA processing that uses splitting into directional and residual ambient components, Fig. 1 shows an HOA compression processing as described in patent application EP 13305558.2 , in which - following estimation of dominant sound source directions - a coding of directional side information is carried out. For the HOA representation compression a frame-wise processing with non-overlapping input frames C(k) of HOA coefficient sequences of length L is used, where k denotes the frame index. The first step or stage 11/12 in Fig. 1 is optional and consists of concatenating the non-overlapping k -th and (k-1)-th frames of HOA coefficient sequences into a long frame C̃(k) as C̃(k):=[C(k-1) C(k)] (the tilde symbol indicates long overlapping frames). This long frame is 50% overlapped with an adjacent long frame and is successively used for the estimation of dominant sound source directions. If step/stage 11/12 is not present, the tilde symbol has no specific meaning.
In step or stage 13 dominant sound sources are estimated. The estimation provides a data set
⊆ {1, ...,D} of indices of directional signals that have been detected as well as the set
of corresponding direction estimates. D denotes the maximum number of directional signals that has to be set before starting the HOA compression. In step or stage 14, the current frame C̃(k) of HOA coefficient sequences is decomposed into a number of directional signals X _DIR(k-2) belonging to the directions contained in the set
and a residual ambient HOA component C _AMB (k-2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals. It is assumed that X _DIR(k - 2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero. The indices specifying these channels are assumed to be output in the data set I _DIR,ACT(k - 2). Additionally, the decomposition in step/stage 14 provides some parameters ζ(k-2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals. In step or stage 15, the number of coefficients of the ambient HOA component C _AMB(k-2) is reduced so as to contain only O _RED + D - N _DIR,_ACT(k-2) non-zero HOA coefficient sequences, where N _DIR,ACT(k-2) - |I _DIR,ACT(k-2)| indicates the cardinality of the data set I _DIR,ACT(k-2), i.e. the number of active directional signals in frame k-2. Since the ambient HOA component is assumed to be always represented by a minimum number O _RED of HOA coefficient sequences, this problem can be actually reduced to the selection of the remaining D-N _DIR,ACT(k-2) HOA coefficient sequences out of the possible O-O _RED ones. In order to obtain a smooth reduced ambient HOA representation, this choice is accomplished such that, compared to the choice taken at the previous frame k-3, as few changes as possible will occur.
The final ambient HOA representation with the reduced number of O _RED+N _DIR,ACT(k-2) non-zero coefficient sequences is denoted by C _AMS,RED(k-2). The indices of the chosen ambient HOA coefficient sequences are output in the data set I _AMB,ACT(k - 2). In step/stage 16, the active directional signals contained in X _DIR(k-2) and the HOA coefficient sequences contained in C _AMB,RED(k-2) are assigned to the frame Y (k-2) of I channels for individual perceptual encoding.
According to the present invention, the data set
⊆ {1,...,D} of indices of directional signals and the data set
of corresponding direction value estimates from the estimation step/stage 13 are fed to a step or stage 18 that encodes the directional side information as described in the following. Step/stage 18 outputs a vector a (k) denoting which directional signals are active in frame k , as well as a coded representation
of all directions. The values of
can be entropy encoded.
The HOA decompression processing described in patent application EP 13305558.2 , together with an additional step or stage 34 for decoding the received encoded directional side information, is depicted in Fig. 2. Step/stage 34 receives vector a (k) denoting which directional signals are active in frame k, and the coded representation
of all directions. Step/stage 34 decodes as described below this directional side information and outputs the data set
⊆ {1, ...,D} of indices of directional signals and the set
of corresponding direction estimates.
In step or stage 31 a perceptual decoding of the I signals contained in
is performed in order to obtain the I decoded signals in Ŷ (k-2). In signal re-distributing step or stage 32, the perceptually decoded signals in Ŷ (k-2) are redistributed in order to recreate the frame X̂ _DIR(k-2) of directional signals and the frame Ĉ _AMB,RED(k-2) of the ambient HOA component. The information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets Ĩ _DIR,ACT(k) and I _AMB,ACT(k-2).
In composition step or stage 33, a current frame Ĉ(k-3) of the desired total HOA representation is re-composed using the frame X̂ _DIR(k-2) of the directional signals, the set
of the active directional signal indices together with the set
of the corresponding directions, the parameters ζ(k-2) for predicting portions of the HOA representation from the directional signals, and the frame Ĉ _AMS,RED(k-2) of HOA coefficient sequences of the reduced ambient HOA component. I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals (X̂ _DIR(k-2)) using the received parameters (ζ(k-2)) for such prediction, and thereafter the current decompressed frame (Ĉ(k-3)) is re-composed from the frame of directional signals ( X̂ _DIR(k-2)), the predicted portions and the reduced ambient HOA component (Ĉ _AMB,RED(k-2)).
As mentioned above, in patent application EP 13305558.2 the directional HOA component for the k-th frame is represented by a number D _ACT(k) of directional signals and additional side information. This side information comprises on one hand the set ${\tilde{ℑ}}_{DIR, ACT} (k) : = \{i_{ACT, d} (d) | d = 1, \dots, D_{ACT} (k)\}$
of indices i _ACT,d(k) of directional signals that have been detected. On the other hand, the side information consists of the set ${\tilde{G}}_{Ω, ACT} (k) : \{Ω_{ACT, d} (k) | d = 1, \dots, D_{ACT} (k)\}$
of the corresponding directions Ω _ACT,d(k).
To illustrate the meaning of the side information by way of an example, the case is considered where the maximum number D of directional signals is equal to two. Fig. 4 illustrates an exemplary result of the direction estimation for the first 7 frames (cf. Fig. 3 and the corresponding description of the representation of a direction in a spherical coordinate system). The dots in Fig. 4 represent a grid of possible directions. The direction estimates related to the directional signal with index 1 are marked by diamonds and the direction estimates related to the directional signal with index 2 are marked by crosses. The directional signal with index 1 representing a first trajectory is supposed to be active from frame k = 1 to k = 4, whereas the directional signal with index 2 representing a second trajectory is supposed to be active from frame k = 3 to k = 7.
This activity information is additionally illustrated in Fig. 5, which shows for each frame index k if the direction with the respective index is active (indicated by white) or not (indicated by black). The resulting index sets
k = 1,2,...,7 corresponding to Fig. 4 are summarised in Table 1:

k 1 2 3 4 5 6 7

Ĩ _DIR,ACT(k) {1} {1} {1,2} {1,2} {2} {2} {2}

Coding of the indices of directional signals

Because the indices of active directional signals correspond to the indices of D channels to which these directional signals are assigned, for coding the indices of the directional signals a bit array of length D is used that is represented by the vector $a (k) : = {[\begin{matrix} a_{1} (k) & a_{2} (k) & \dots & a_{D} (k) \end{matrix}]}^{T}$
where (·) ^T denotes transposition and the elements a_i (k), i ∈ {1,...,D}, are defined as follows: $a_{i} (k) = (\begin{array}{l} 1 & if the i - th directional signal is active inframe k \\ 0 & else \end{array} .$
There are exactly D _ACT (k) non-zero elements in vector a (k), which corresponds to the number of active directional signals.
For the compression of 3D audio signals it is reasonable to assume not more than four to eight directional signals in a frame, and therefore a(k) will contain e.g. 4 to 8 bits per frame. A current frame k will contain none, part or all of this set of pre-determined directional signals. For a current frame k the current vector a (k) is transferred to the decoder or decompression side.
To illustrate the coding of indices of directional signals, in Table 2 the values of the vector a (k), k=1,...,7, for the index sets Ĩ _DIR,ACT(k) given in Table 1 are provided as an example:

k 1 2 3 4 5 6 7

a (k) $[\begin{matrix} 1 \\ 0 \end{matrix}]$
$[\begin{matrix} 1 \\ 0 \end{matrix}]$
$[\begin{matrix} 1 \\ 1 \end{matrix}]$
$[\begin{matrix} 1 \\ 1 \end{matrix}]$
$[\begin{matrix} 0 \\ 1 \end{matrix}]$
$[\begin{matrix} 0 \\ 1 \end{matrix}]$
$[\begin{matrix} 0 \\ 1 \end{matrix}]$

Coding of the directions

The direction values used for a frame may vary from frame to frame. Assuming that the indices i _ACT,d(k), d = 1,...,D _ACT(k) are ordered in an ascending order, it is sufficient to code the direction values Ω _ACT,d(k), d = 1,...,D _ACT(k) one after the other in order to be able to unambiguously link them to the indices. In other words, given the vector a (k) and the sequence of coded directions, it can be assumed that the coded direction value Ω _ACT,1(k) corresponds to the index indicated by the first non-zero element in a (k), and the coded direction value Ω _ACT,2(k) is assumed to correspond to the index indicated by the second non-zero element in a (k), etc. As mentioned above, for frame k the coded representation of all direction values in the set
are denoted by
and
is transferred to the decoder or decompression side.
In the following, the problem of how to efficiently encode the direction values Ω _ACT,d(k), d = 1,...,D _ACT(k) for generating
is addressed. In principle, assuming a spherical coordinate system as shown in Fig. 3 , each direction Ω _ACT,d(k) can be unambiguously represented by the tuple $Ω_{ACT, d} (k) = (θ_{k} (k), ϕ_{d} (k)),$
where θ_d(k) ∈ [0,π] denotes an inclination angle measured from the polar axis z and φ_d(k) ∈ [0,2π[ indicates an azimuth angle measured counter-clockwise in the x - y plane from the x axis.
On one hand, the inclination and azimuth angles could be quantised individually, in particular by assuming M_θ = 2^Qθ possible discrete values for the inclination angle and M_φ = 2 ^Qφ possible discrete values for the azimuth angle, resulting in a total number of Q_θ + Q_φ bits required for the coding of a single direction. On the other hand, the disadvantage of such individual or specific quantisation processing is that likely such specifically quantised direction values will not exactly match with the pre-defined grid of directions: in order to not introduce errors when carrying out in the HOA decompressor the re-synthesis of the HOA representation of the directional signals due to direction quantisation errors, the extraction of directional signals from the given HOA representation in the HOA compressor in step/stage 13 in Fig. 1 is based on that pre-defined grid of directions. Patent applications EP 12306569.0 and EP 13305558.2 describe how directional signals can be extracted from an HOA representation.
The problem that the directions of the quantised direction values do not exactly match with the estimated directions can be solved in a first embodiment by exploiting the fact that the splitting into directional and ambient residual components and the direction estimation described in patent applications EP 12306569.0 and EP 13305558.2 is based on a direction search which is carried out on a fixed grid of directions (cf. patent application EP 13305156.5 for a description of direction estimation as an example). Such fixed grid represents the above-mentioned re-quantisation. In particular, the estimated direction Ω _ACT,d(k) is an element of a set { Ωq|q = 1,...,Q} of Q predefined directions. Exploiting this knowledge, the direction values in step/stage 18 can be quantised according to this pre-defined grid, by representing a direction by the index q ∈ {1,...,Q}. Then, a quantised representation of a single direction will require ┌log₂(Q)┐ bits. For instance, using a grid consisting of Q=900 predefined directions would require 10 bits for a corresponding quantisation.
Such coding of the directions offers the further advantage that it is not recursive, meaning that no knowledge of the direction estimates from previous frames is required for the decoding of the directions. However, a disadvantage of such processing is that in general it does not achieve the related minimum possible average bit rate.
At decompressor side, for a current audio signal frame C(k), the possibly entropy encoded representation
of all directions is received, wherein these direction values
were quantised according to said pre-defined grid, and vector a (k) is received that comprises the encoded indices about which directions from the set of pre-defined directions are present in the current audio signal frame C(k). If necessary, an entropy decoding takes place. The quantised direction values
are re-quantised according to the pre-defined grid said, and vector a (k) is decoded. From the re-quantised direction values and the decoded vector a data set
of dominant signal direction values for said current audio signal frame C(k) and a data set Ĩ _DIR,ACT(k) of indices of corresponding directional signals is provided.
In a second embodiment, the average bit rate for the coding of the directions for successive frames is further reduced by exploiting the relation between the direction estimates of successive frames. In particular, the direction estimation as proposed in patent application EP 13305558.2 is based on an sound source movement model, which predicts the direction of a sound source in the k -th frame based on its movement between the (k-2)-th and (k-1)-th frame.
Therefore the quantised direction values (e.g. the direction index as proposed in the first embodiment) of the k -th frame are in the second embodiment coded using entropy coding like e.g. Huffman coding. The individual code words for the direction values have a variable bit size depending on the frame adaptively determined probability of the individual directions. In particular, direction values with a high probability are coded using small-size code words and direction values with a low probability are coded using large-size code words.
Such a coding strategy requires computation of the probability for the individual directions during HOA decompression in the same way as for the HOA compression. At decompression side, the received entropy encoded quantised direction values are entropy decoded wherein frame adaptively an probability of the individual directions is determined. However, this requires a high computational complexity in the HOA decompressor for computing the probabilities of the Q possible directions in each frame. Further, the processing is recursive, meaning that the decoding of a direction at decompression side is based on the knowledge of the directions from the previous two frames.
In a third embodiment, in order to reduce the computational complexity introduced by evaluating frame-by-frame the a-priori probabilities for all Q possible directions in the HOA decompressor as described for the second embodiment, the number of possible probabilities is constrained to Q by making the probabilities dependent on the corresponding direction estimate in the last frame.
One possibility to define such conditional probabilities is to set them inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame. Another possibility is to measure the conditional a-priori probabilities of the direction estimates from some HOA representations instead of setting them.
At decompression side, the received entropy encoded quantised direction values
are frame adaptively entropy decoded depending on an probability of the individual directions, whereby the number of possible probabilities is constrained to the number of directions in the pre-defined grid and the probabilities are dependent on the corresponding direction estimate in the last frame. Such a technique requires that the HOA decompressor holds for each of the Q possible test directions an entropy decoding table (e.g. a Huffman table) containing Q code words and respective indices of the temporally following directions.
In a fourth embodiment, instead of considering the conditioned probabilities for the entropy coding, the non-conditional probabilities of the quantised directions are employed. Such probabilities can be measured from some test HOA representations, or can be assigned according to expectations about typical HOA sound field representation. For example, high probabilities are assigned for directions in the front and low probabilities for directions in the back.
Such a processing has the advantage of not being recursive, i.e. the decoding of a direction value is not based on the knowledge of any directions from previous frames. However, due to the use of non-conditional probabilities, in general the efficiency of this kind of processing is likely lower than that of the third embodiment.
In a fifth embodiment, in order to alleviate computational load and extensive storage requirements for the HOA decompressor, for each frame it is decided which one of the above embodiments is used, resulting in a set of four (or less) modes:

processing according to the first embodiment;
processing according to a combination of the first embodiment and the second embodiment;
processing according to a combination of the first embodiment and the third embodiment;
processing according to a combination of the first embodiment and the fourth embodiment.

The mode decision can be indicated by a Boolean variable which is prepended to the coded representation of a direction. Such a mode decision will in most cases minimise the bit amount of the corresponding code.
The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
The invention can be applied in any application where some directional information has to be efficiently coded, e.g. object based 3D audio where directional signals and object based side information have to be coded.

Claims

Method for encoding directional side information for a 3D audio signal, characterised by the steps:
- receiving a data set
of dominant signal direction values for a current audio signal frame (C(k)) and a data set (Ĩ _DIR,ACT(k)) of indices of corresponding directional signals, wherein the dominant signal directions were estimated (13) from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction (13) of said directional signals from said 3D audio signal;

- encoding (18) said directional side information for said current audio signal frame (C(k)) by quantising, using said pre-defined grid, the direction values in said received data set
of dominant signal directions, and by establishing a vector ( a (k)) that defines which directions from a set of pre-defined directions are present in said current audio signal frame ( C (k)).
Apparatus for encoding directional side information for a 3D audio signal, said apparatus including:
means (18) being adapted for encoding (18) said directional side information for a current audio signal frame ( C (k)), which means receive a data set (G̃ _Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ _DIR,ACT(k)) of indices of corresponding directional signals, wherein the dominant signal directions were estimated (13) from candidates determined from a pre-defined grid of directions, and the determined dominant signal direction values were used for an extraction (13) of said directional signals from said 3D audio signal,

and which means (18) quantise, using said pre-defined grid, the direction values in said received data set
of dominant signal directions, and establish a vector ( a (k)) that defines which directions from a set of pre-defined directions are present in said current audio signal frame (C(k)).
Method according to claim 1, or apparatus according to claim 2, wherein frame adaptively an probability of the individual directions is determined based on the knowledge of the directions from the previous two frames, and wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on said probability.
Method according to claim 1, or apparatus according to claim 2, wherein frame adaptively an probability of the individual directions is determined whereby the number of possible probabilities is constrained to the number of directions in said pre-defined grid and said probabilities are dependent on the corresponding direction estimate in the last frame, and wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on said probability.
Method according to the method of claim 4, or apparatus according to the apparatus of claim 4, wherein said conditional probabilities are inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame.
Method according to the method of claim 4, or apparatus according to the apparatus of claim 4, wherein said conditional a-priori probabilities are measured from several HOA representations.
Method according to claim 1, or apparatus according to claim 2, wherein said quantised direction values are coded using entropy coding with variable bit size code words for the direction values depending on non-conditional a-priori probabilities.
Method according to the method of one of claims 1, 3, 4 or 7, or apparatus according to the apparatus of one of claims 2, 3, 4 or 7, wherein for each 3D audio signal frame (C(k)) it is decided which one of the processings according to claims 1, 3, 4 or 7 is carried out, and a corresponding mode code word representing said selection is provided.
Method according to the method of one of claims 1 and 3 to 8, or apparatus according to the apparatus of one of claims 2 to 8, wherein said 3D audio signal is an HOA audio signal.
Computer program product comprising instructions which, when carried out on a computer, perform the method according to one of claims 1 and 3 to 9.
Method for decoding directional side information for a 3D audio signal, which directional side information was encoded according to claim 1, characterised by the steps:
- receiving for a current audio signal frame (C(k)) direction values
quantised according to said pre-defined grid and a vector ( a (k)) that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame ( C (k));

- re-quantising (34) according to said pre-defined grid said quantised direction values
and decoding (34) said vector ( a (k));

- providing (34) from said re-quantised direction values and said decoded vector a data set (G̃ _Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ _DIR,ACT(k)) of indices of corresponding directional signals.
Apparatus for decoding directional side information for a 3D audio signal, which directional side information was encoded according to claim 1, said apparatus including means (34) being adapted for:
- receiving for a current audio signal frame (C(k)) direction values
quantised according to said pre-defined grid and a vector ( a (k)) that comprises encoded indices about which directions from said set of pre-defined directions are present in said current audio signal frame ( C (k));

- re-quantising according to said pre-defined grid said quantised direction values
and decoding said vector ( a (k));

- providing from said re-quantised direction values and said decoded vector a data set (G̃ _Ω,ACT(k)) of dominant signal direction values for said current audio signal frame (C(k)) and a data set (Ĩ _DIR,ACT(k)) of indices of corresponding directional signals.
Method according to claim 11, or apparatus according to claim 12, wherein said received quantised direction values
are entropy encoded quantised direction values and frame adaptively an a-priori probability of the individual directions is determined based on the knowledge of the directions from the previous two frames, and wherein said entropy encoded quantised direction values are entropy decoded depending on said a-priori probability.
Method according to claim 11, or apparatus according to claim 12, wherein said received quantised direction values
are entropy encoded quantised direction values and frame adaptively an a-priori probability of the individual directions is determined whereby the number of possible a-priori probabilities is constrained to the number of directions in said pre-defined grid and said a-priori probabilities are dependent on the corresponding direction estimate in the last frame, and wherein said entropy encoded quantised direction values are entropy decoded with variable bit size code words depending on said a-priori probability.
Method according to the method of one of claims 11, 13 and 14, or apparatus according to the apparatus of one of claims 12 to 14, wherein said 3D audio signal is an HOA audio signal.
Method according to the method of claims 11 and 14, or apparatus according to the apparatus of claims 12 and 14, wherein said decoding of directional side information is carried out in an HOA decompressor and this HOA decompressor holds for each possible direction of said pre-defined grid an entropy decoding table containing a corresponding code word and respective indices of the temporally following directions.