OA18600A

OA18600A - Layered coding for compressed sound or sound field representations.

Info

Publication number: OA18600A
Application number: OA1201800124
Authority: OA
Inventors: Sven Kordon; Alexander Krueger
Original assignee: Dolby International Ab
Priority date: 2015-10-08
Filing date: 2016-10-07
Publication date: 2018-12-28

Abstract

The present document relates to a method of layered encoding of a compressed sound representation of a sound or sound field. The compressed sound representation comprises a basic compressed sound representation comprising a plurality of components, basic side information for decoding the basic compressed sound representation to a basic reconstructed sound representation of the sound or sound field, and enhancement side information including parameters for improving the basic reconstructed sound representation. The method comprises sub-dividing the plurality of components into a plurality of groups of components and assigning each of the plurality of groups to a respective one of a plurality of hierarchical layers, the number of groups corresponding to the number of layers, and the plurality of layers including a base layer and one or more hierarchical enhancement layers, adding the basic side information to the base layer, and determining a plurality of portions of enhancement side information from the enhancement side information and assigning each of the plurality of portions of enhancement side information to a respective one of the plurality of layers, wherein each portion of enhancement side information includes parameters for improving a reconstructed sound representation obtainable from data included in the respective layer and any layers lower than the respective layer. The document further relates to a method of decoding a compressed sound representation of a sound or sound field, wherein the compressed sound representation is encoded in a plurality of hierarchical layers that include a base layer and one or more hierarchical enhancement layers, as well as to an encoder and a decoder for layered coding of a compressed sound representation.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application daims priority to European Patent Application No. 15306590.9 filed on October 8,2015 and United States Patent Application No. 62/361,809, which are încorporated herein by reference in their entirety.

TECHNICAL FIELD

The présent document relates to methods and apparatuses for layered audio coding, ln particular, the présent document relates to methods and apparatuses for layered audio coding of compressed sound (or sound field) représentations, for example Higher-Order Ambisonics (HOA) sound (or sound field) représentations.

BACKGROUND

For the streaming of a sound (or sound field) représentation over a transmission channel with time-varying conditions, layered coding is a means to adapt the quality of the received sound représentation to the transmission conditions, and in particular to avoid undesired signal dropouts.

For layered coding, the sound (or sound field) représentation is usually subdivided into a high priority base layer of a relatively small size and additional enhancement layers with décrémentai priorities and arbitrary sizes. Each enhancement layer îs typically assumed to contain incrémental information to complément that of ail lower layers in order to improve the quality of the sound (or sound field) représentation. The amount of errer protection for the transmission of the individuel layers is controlled based on their priority. ln particular, the base layer is provided with a high errer protection, which is reasonable and affordable due to its low size.

However, there Is a need for layered coding schemes for (extended versions of) spécial types of compressed représentations of sound or sound fields, such as, for example, compressed HOA sound or sound field représentations.

The présent document addresses the above Issues, ln particular, methods and encoders/decoders for layered coding of compressed sound or sound field représentations are described.

SUMMARY

According to an aspect, a method of layered encoding of a compressed sound représentation of a sound or sound field is described. The compressed sound représentation

-1 18600 may Include a basic compressed sound représentation that includes a plurality of components. The plurality of components may be complementary components. The compressed sound représentation may further include basic side information fordecoding the basic compressed sound représentation to a basic reconstructed sound représentation of the sound or sound field. The compressed sound représentation may yet further include enhancement side information including parameters for improving (e.g., enhancing) the basic reconstructed sound représentation. The method may include sub-dividing (e.g., grouping) the pluraiity of components into a plurality of groups of components. The method may further include assigning (e.g., adding) each of the plurality of groups to a respective one of a plurality of hierarchical layers. The assignaient may indicate a correspondence between respective groups and layers. Components assigned to a respective layer may be said to be Included in that layer. The number of groups may correspond to (e.g., be equal to) the number of layers. The plurality of layers may include a base layer and one or more hierarchical enhancement layers. The plurality of hierarchical layers may be ordered, from the base layer, through the first enhancement layer, the second enhancement layer, and so forth, up to an overall highest enhancement layer (overall highest layer). The method may further include adding the basic side information to the base layer (e.g., including the basic side information in the base layer, or allocating the basic side information to the base layer, for example for purposes of transmission or storing). The method may further include determlning a plurality of portions of enhancement side information from the enhancement side information. The method may yet further include assigning (e.g., adding) each of the plurality of portions of enhancement side information to a respective one of the plurality of layers. Each portion of enhancement side information may Include parameters for improving a reconstructed (e.g., decompressed) sound représentation obtainable from data included in (e.g., assigned or added to) the respective layer and any layers lower than the respective layer. The layered encoding may be performed for purposes of transmission over a transmission channel or for purposes of storing in a suitable storage medium, such as a CD, DVD, or Blu-ray Disc™, for example.

Configured as above, the proposed method enables to efficiently apply layered coding to compressed sound représentations comprising a plurality of components as well as first and enhancement side information (e.g., independent basic side information and enhancement side information) having the properties set out above. In particular, the proposed method ensures that each layer Includes suitable side information for reconstructing a reconstructed sound représentation from the components included in any layers up to the layer in question. Therein the layers up to the layer in question are understood to include, for example, the base layer, the first enhancement layer, the second enhancement layer, and so forth, up to the layer in question. Thus, regardless of an actual highest usable layer (e.g., the layer below the lowest layer that has not been validly received, so that ail layers below the highest usable layer and the highest usable layer itself hâve been validly received), a décoder would be enabled to improve

-218600 or enhance a reconstructed sound représentation, even though the reconstructed Sound représentation may be different from the complété (e.g., full) sound représentation. In particular, regardless of the actual highest usable layer, it is sufficient for the décoder to décodé a payîoad of enhancement side information for only a single layer (i.e., for the highest usable layer) to improve or enhance the reconstructed sound représentation that Is obtainable on the basis of ail components included in layers up to the actual highest usable layer. That is, for each time interval (e.g., frame) only a single payload of enhancement side information has to be decoded. On the other hand, the proposed method allows fully taking advantage of the réduction of required bandwidth that may be achieved when applying layered coding.

In embodiments, the components of the basic compressed sound représentation may correspond to monaural signais (e.g., transport signais or monaural transport signais). The monaural signais may represent either prédominant sound signais or coefficient sequences of a HOA représentation. The monaural signais may be quantized.

ln embodiments, the basic side information may include information that spécifiés decoding (e.g., décompression) of one or more of the plurality of components individually, independently of other components. For example, the basic side information may represent side information related to individual monaural signais, Independently of other monaural signais. Thus, the basic side information may be referred to as independent basic side information.

ln embodiments, the enhancement side information may represent enhancement side information. The enhancement side information may Include prédiction parameters for the basic compressed sound représentation for improving (e.g., enhancing) the basic reconstructed sound représentation that Is obtainable from the basic compressed sound représentation and the basic side Information.

ln embodiments, the method may further include generating a transport stream for transmission of the data of the plurality of layers (e.g., data assigned or added to respective layers, or otherwise included in respective layers). The base layer may hâve highest priority of transmission and the hierarchical enhancement layers may hâve décrémentai priorities of transmission. That is, the priority of transmission may decrease from the base layer to the first enhancement layer, from the first enhancement layer to the second enhancement layer, and so forth. An amount of errer protection for transmission of the data of the plurality of layers may be controlled in accordance with respective priorities of transmission. Thereby, it can be ensured that at least a number of lower layers is reliably transmitted, while on the other hand reducing the overall required bandwidth by not applying excessive error protection to higher layers.

ln embodiments, the method may further include, for each of the plurality of layers, generating a transport layer packet including the data of the respective layer. For example, for each time interval (e.g., frame), a respective transport layer packet may be generated for each of the plurality of layers.

-318600

In embodiments, the compressed sound représentation may further include additional basic side information for decoding the basic compressed sound représentation to the basic reconstructed sound représentation. The additional basic side information may include information that spécifiés decoding of one or more of the plurality of components in dependence 5 on respective other components. The method may further include decomposing the additional basic side information into a plurality of portions of additional basic side information. The method may yet further include adding the portions of additional basic side information to the base layer (e.g., including the portions of additional basic side information in the base layer, or allocating the portions of additional basic side information to the base layer, for example for purposes of transmission or storing). Each portion of additional basic side information may correspond to a respective layer and may include information that spécifiés decoding of one or more components assigned to the respective layer in dependence (only) on respective other components assigned to the respective layer and any layers lower than the respective layer. That is, each portion of additional basic side information spécifiés components in the respective layer to which that portion of additional basic side information corresponds without reference to any other components assigned to higher layers than the respective layer.

Configured as such, the proposed method avoids fragmentation of the additional basic side information by adding ail portions to the base layer. In other words, ail portions of additional basic side information are included in the base layer. The décomposition of the additional basic 20 side information ensures that for each layer a portion of additional basic side information is available that does not require knowledge of components in higher layers. Thus, regardiess of an actual highest usable layer, it is sufficïent for the décoder to décodé additional basic side information included in layers up to the highest usable layer.

In embodiments, the additional basic side information may include information that spécifiés decoding (e.g., décompression) of one or more of the plurality of components in dependence on other components. For example, the additional basic side information may represent side information related to individual monaural signais in dependence on other monaural signais. Thus, the additional basic side information may be referred to as dépendent basic side information.

In embodiments, the compressed sound représentation may be processed for successive time intervals, for example time intervals of equal size. The successive time intervals may be frames. Thus, the method may operate on a frame basis, i.e., the compressed sound représentation may be encoded in a frame-wise manner. The compressed sound représentation may be available for each successive time interval (e.g., for each frame). That is, the compression operation by which the compressed sound représentation has been obtained may operate on a frame basis.

In embodiments, the method may further include generating configuration information that indicates, for each layer, the components of the basic compressed sound représentation

-418600 that are assigned to that layer. Thus, the décoder can readily access the information needed for decoding without unnecessary parsing through the receîved data payloads.

According to another aspect, a method of layered encoding of a compressed sound représentation of a sound or sound field is described. The compressed sound représentation may inciude a basic compressed sound représentation that includes a plurality of components. The plurality of components may be complementary components. The compressed sound représentation may further include basîc side information (e.g., Independent basic side Information) and third information (e.g., dépendent basic side information) for decoding the basic compressed sound représentation to a basic reconstructed sound représentation of the sound or sound field. The basic side information may including information that spécifiés decoding of one or more of the plurality of components individually, independently of other components. The additional basic side information may include information that spécifiés decoding of one or more of the plurality of components In dependence on respective other components. The method may include sub-dividing (e.g., grouplng) the plurality of components into a plurality of groups of components. The method may further include assigning (e.g., adding) each of the plurality of groups to a respective one of a plurality of hierarchical layers. The assignment may Indicate a correspondance between respective groups and layers. Components assigned to a respective layer may be said to be Included in that layer. The number of groups may correspond to (e.g., be equal to) the number of layers. The plurality of layers may inciude a base layer and one or more hierarchical enhancement layers. The method may further include adding the basic side information to the base layer (e.g., including the basic side information in the base layer, or aliocating the basic side information to the base layer, for example for purposes of transmission or storing). The method may further include decomposîng the additional basic side information into a plurality of portions of additional basic side information and adding the portions of additional basic side information to the base layer (e.g., including the portions of additional basic side information in the base layer, or aliocating the portions of additional basic side Information to the base layer, for example for purposes of transmission or storing). Each portion of additional basic side information may correspond to a respective layer and Include information that spécifiés decoding of one or more components assigned to the respective layer in dependence on respective other components assigned to the respective layer and any layers lower than the respective layer.

Configured as such, the proposed method ensures that for each layer, appropriate additional basic side information is available for decoding the components included in any layer up to the respective layer, without requiring valid réception or decoding (or in générai, knowledge) of any higher layers. In the case of a compressed HOA représentation, the proposed method ensures that in vector coding mode a suitable V-vector is available for ail component belonging to layers up to the highest usable layer. In particular, the proposed method excludes the case that éléments of a V-vector corresponding to components in higher

-518600 layers are not explicitly signaled. Accordingly, the information included In the layers up to the highest usable layer is sufficient for decoding (e.g., decompressing) any components belonging to layers up to the highest usable layer. Thereby, appropriate décompression of respective reconstructed HOA représentations for lower layers is ensured even if higher layers may not

S hâve been validly received by the décoder. On the other hand, the proposed method allows fully taking advantage of the réduction of required bandwidth that may be achieved when applying layered coding.

Embodiments of this aspect may relate to the embodiments of the foregoing aspect.

According to another aspect, a method of decoding a compressed sound représentation 10 of a sound or Sound field is described. The compressed sound représentation may hâve been encoded in a plurality of hierarchical layers. The plurality of hierarchical layers may include a base layer and one or more hierarchical enhancement layers. The plurality of layers may hâve assigned thereto components of a basic compressed sound représentation of a sound or sound field. In other words, the plurality of layers may include the components of the basic compressed side information. The components may be assigned to respective layers in respective groups of components. The plurality of components may be complementary components. The base layer may include basic side information for decoding the basic compressed sound représentation. Each layer may include a portion of enhancement side information including parameters for improving a basic reconstructed sound représentation obtainable from data included in the respective layer and any layers lower than the respective layer. The method may include receiving data payloads respectively corresponding to the plurality of hierarchical layers. The method may further include determining a first layer index indicating a highest usable layer among the plurality of layers to be used for decoding the basic compressed sound représentation tothe basic reconstructed sound représentation ofthesound or sound field. The method may further include obtaining the basic reconstructed Sound représentation from the components assigned to the highest usable layer and any layers lower than the highest usable layer, using the basic side information. The method may further include determining a second layer index that is indicative of which portion of enhancement side information should be used for Improving (e.g., enhancing) the basic reconstructed sound représentation. The method may yet further include obtaining a reconstructed sound représentation of the sound or sound field from the basic reconstructed sound représentation, referring to the second layer index.

Configured as such, the proposed method ensures that the reconstructed sound représentation has optimum quality, using the available (e.g., validly received) information to the 35 best possible extent.

In embodiments, the components of the basic compressed sound représentation may correspond to monaural signais (e.g., monaural transport signais). The monaural signais may

-618600 represent either prédominant sound signais or coefficient sequences of a HOA représentation.

The monaural signais may be quantized.

In embodiments, the basic side Information may include information that spécifiés decoding (e.g., décompression) of one or more of the plurality of components Individually, Independently of other components. For example, the basic side information may represent side Information related to individual monaural signais, independently of other monaural signais. Thus, the basic side information may be referred to as independent basic side information.

In embodiments, the enhancement side information may represent enhancement side information. The enhancement side information may include prédiction parameters for the basic compressed sound représentation for improving (e.g., enhancing) the basic reconstructed sound représentation that is obtainable from the basic compressed sound représentation and the basic side information.

In embodiments, the method may further include determining, for each layer, whether the respective layer has been validly received. The method may further include determining the first layer index as the layer index of a layer immediately below the lowest layer that has not been validly received.

In embodiments, determining the second layer index may involve either determining the second layer index to be equal to the first layer index, or determining an index value as the second layer index that indicates not to use any enhancement side information when obtaining the reconstructed sound représentation. In the latter case, the reconstructed sound représentation may be equal to the basic reconstructed sound représentation.

In embodiments, the data payloads may be received and processed for successive time întervals, for example time intervals of equal size. The successive time intervals may be frames. Thus, the method may operate on a frame basis. The method may further include, if the compressed sound représentations for the successive time intervals can be decoded independently of each other, determining the second layer index to be equal to the first layer index.

In embodiments, the data payloads may be received and processed for successive time intervals, for example time intervals of equal size. The successive time intervals may be frames. Thus, the method may operate on a frame basis. The method may further include, for a given time interval among the successive time intervals, if the compressed sound représentations for the successive time intervals cannot be decoded independently of each other, determining, for each layer, whether the respective layer has been validly received. The method may further include determining the first layer index for the given time interval as the smaller one of the first layer index of the time interval preceding the given time interval and the layer index of a layer immediately below the lowest layer that has not been validly received.

In embodiments, the method may further include, for the given time Interval, if the compressed sound représentations for the successive time intervals cannot be decoded

-718600 independently of each other, determining whether the first layer index for the given time interval is equal to the first layer index for the preceding time interval. The method may further include, if the first layer index for the given time interval is equal to the first layer index for the preceding time interval, determining the second layer index for the given time interval to be equal to the first layer index for the given time interval. The method may further include, if the first layer index for the given time interval is not equal to the first layer index for the preceding time interval, determining an index value as the second layer Index that indicates not to use any enhancement side information when obtaining the reconstructed Sound représentation.

In embodiments, the base layer may Include at least one portion of additional basic side information corresponding to a respective layer and including Information that spécifiés decoding of one or more components among the components assigned to the respective layer in dependence on other components assigned to the respective layer and any layers lower than the respective layer. The method may further include, for each portion of additional basic side information, decoding the portion of additional basic side information by referring to the components assigned to its respective layer and any layers lower than the respective layer. The method may further include correcting the portion of additional basic side information by referring to the components assigned to the highest usable layer and any layers between the highest usable layer and the respective layer. The basic reconstructed sound représentation may be obtained from the components assigned to the highest usable layer and any layers lower than the highest usable layer, using the basic side information and corrected portions of additional basic side information obtained from portions of additional basic side information corresponding to layers up to the highest usable layer.

According to another aspect, a method of decoding a compressed sound représentation of a sound or sound fïeld is described. The compressed sound représentation may hâve been encoded in a plurality of hierarchical layers. The plurality of hierarchical layers may include a base layer and one or more hierarchical enhancement layers. The plurality of layers may hâve assigned thereto components of a basic compressed sound représentation of a sound or sound fïeld. In other words, the plurality of layers may include the components of the basic compressed side information. The components may be assigned to respective layers in respective groups of components. The plurality of components may be complementary components. The base layer may include basic side information for decoding the basic compressed sound représentation. The base layer may further include at least one portion of

-818600 additional basic side information corresponding to a respective layer and including information that spécifiés decoding of one or more components among the components assigned to the respective layer in dependence on other components assigned to the respective layer and any layers lower than the respective layer. The method may include receiving data payloads 5 respectively corresponding to the plurality of hierarchical layers. The method may further include determining a first layer index indicating a highest usable layer among the plurality of layers to be used for decoding the basic compressed sound représentation to the basic reconstructed sound représentation of the sound or sound field. The method may further include, for each portion of additional basic side information, decoding the portion of additional 10 basic side information by referring to the components assigned to its respective layer and any layers lower than the respective layer. The method may further include, for each portion of additional basic side information, correcting the portion of additional basic side information by referring to the components assigned to the highest usable layer and any layers between the highest usable layer and the respective layer. The basic reconstructed sound représentation 15 may be obtained from the components assigned to the highest usable layer and any layers lower than the highest usable layer, using the basic side information and corrected portions of additional basic side information obtained from portions of additional basic side information corresponding to layers up to the highest usable layer. The method may further comprise determining a second layer Index that Is either equal to the first layer index or that indicates 20 omission of enhancement side information during decoding.

Configured as such, the proposed method ensures that the additional basic side information that is eventualiy used for decoding the basic compressed sound représentation does not include redondant éléments, thereby rendering the actual decoding of the basic compressed sound représentation more efficient.

According to another aspect, an encoder for layered encoding of a compressed sound représentation of a sound or sound field is described. The compressed sound représentation may include a basic compressed sound représentation that includes a plurality of components. The plurality of components may be complementary components. The compressed sound représentation may further include basic side information for decoding the basic compressed sound représentation to a basic reconstructed sound représentation of the sound or sound field. The compressed sound représentation may yet further include enhancement side information including parameters for improving (e.g., enhancing) the basic reconstructed sound représentation. The encoder may include a processor configured to perforai some or ail of the method steps of the methods according to the first-mentioned above aspect and the secondmentioned above aspect.

According to another aspect, a décoder for decoding a compressed sound représentation of a sound or sound field is described. The compressed sound représentation

-918600 may hâve been encoded in a plurality of hierarchical layers. The plurality of hierarchical layers may include a base layer and one or more hierarchical enhancement layers. The plurality of layers may hâve assigned thereto components ofa basic compressed sound représentation of a sound or sound field. In other words, the plurality of layers may include the components of the 5 basic compressed side information. The components may be assigned to respective layers in respective groups of components. The plurality of components may be complementary components. The base layer may include basic side information for decoding the basic compressed sound représentation. Each layer may include a portion of enhancement side information Including parameters for improvïng (e.g., enhancing) a basic reconstructed sound 10 représentation obtainable from data included in the respective layer and any layers lower than the respective layer. The décoder may include a processor configured to perform some or ail of the method steps of the methods according to the third-mentioned above aspect and the fourthmentioned above aspect.

According to other aspects, methods, apparatuses and Systems are directed to decoding 15 a compressed Higher Order Ambisonics (HOA) sound représentation of a sound or sound field. The apparatus may hâve a receiver configured to or the method may receive a bit stream containing the compressed HOA représentation corresponding to a plurality of hierarchical layers that include a base layer and one or more hierarchical enhancement layers, The plurality of layers hâve assigned thereto components of a basic compressed sound représentation ofthe 20 sound or sound field, the components being assigned to respective layers In respective groups of components. The apparatus may hâve a décoder configured to or the method may décodé the compressed HOA représentation based on basic side information that Is associated with the base layer and based on enhancement side information that is associated with the one or more hierarchical enhancement layers. The basic side information may include basic independent 25 side information related to first Individual monaural signais that will be decoded independently of other monaural signais. Each of the one or more hierarchical enhancement layers may include a portion of the enhancement side information including parameters for improving a basic reconstructed sound représentation obtainable from data included in the respective layers and any layers lower than the respective layer.

The basic independent side information may indicate that the first individual monaural signais represents a directional signal with a direction of incidence. The basic side information may further inciude basic dépendent side information related to second individual monaural signais that will be decoded dependently of other monaural signais. The basic dépendent side 35 information may include vector based signais that are directionally distributed within the sound field, where the directional distribution is specified by means of a vector. The components of the vectorare set to zéro and are not part ofthe compressed vector représentation.

-1018600

The components of the basic compressée! sound représentation may correspond to monaural signais that represent either prédominant sound signais or coefficient sequences of an HOA représentation. The bit stream includes data payloads respectively corresponding to the plurality of hierarchical layers. The enhancement side information may include parameters related to at least one of: spatial prédiction, sub-band directional signais synthesis, and parametric ambience réplication. The enhancement side information may include information that allows prédiction of missing portions of the sound or sound field from directional signais. There may be further determined, for each layer, whether the respective layer has been validly received and a layer index of a layer immediately below a lowest layer that has not been validly received.

According to another aspect, a software program is described. The software program may be adapted for execution on a processor and for performing some or ail of the method steps outlined in the présent document when carried out on a computing device.

According to yet another aspect, a storage medium is described. The storage medium may comprise a software program adapted for execution on a processor and for performing some or ail of the method steps outlined in the présent document when carried out on a computing device.

Statements made with regard to any of the above aspects or its embodiments aiso appiy to respective other aspects or their embodiments, as the skilied person will appreciate. Repeating these statements for each and every aspect or embodiment has been omitted for reasons of conciseness.

The methods and apparatuses induding their preferred embodiments as outlined In the présent document may be used stand-alone or in combination with the other methods and Systems disclosed In this document. Furthermore, ail aspects of the methods and apparatus outlined in the présent document may be arbitrarily combined. In particular, the features of the daims may be combined with one another in an arbitrary manner.

Method steps and apparatus features may be interchanged in many ways. In particular, the details of the disclosed method can be Implemented as an apparatus adapted to execute some or ail or the steps of the method, and vice versa, as the skilled person will appreciate.

DESCRIPTION OF THE DRAWINGS

The invention is explained below in an exemplary manner with reference to the accompanying drawings, wherein:

Fig. 1 is a flow chart illustrating an example of a method of layered encoding according to embodiments of the disclosure:

Fig. 2 is a block diagram schematically illustrating an example of an encoder stage according to embodiments of the disclosure;

- il 18600

Fig. 3 is a flow chart illustrating an example of a method of decoding a compressed sound représentation of a sound or sound field that has been encoded to a plurality of hierarchical layers, according to embodiments of the disclosure;

Fig. 4A and Fig. 4B are block diagrams schematicaliy illustrating examples of a décoder stage according to embodiments of the disclosure;

Fig. 5 fs a block diagram schematicaliy illustrating an example of a hardware implémentation of an encoder according to embodiments of the disclosure; and

Fig. 6 Is a block diagram schematicaliy Illustrating an example of a hardware implémentation of a décoder according to embodiments of the disclosure.

DETAILED DESCRIPTION

First, a compressed sound (or sound field) représentation (henceforth referred to as compressed sound représentation for brevity) to which methods and encoders/decoders according to the présent disclosure are applicable will be described. In general, the complété compressed sound (or sound field) représentation (henceforth referred to as complété compressed sound représentation for brevity) may comprise (e.g., consist of) the three following components: a basic compressed sound (or sound field) représentation (henceforth referred to as basic compressed sound représentation for brevity). basic side information, and enhancement side information.

The basic compressed sound représentation Itself comprises (e.g., consists of) a number of components (e.g., complementary components). The basic compressed sound représentation may account for the distinctively largest percentage of the complété compressed sound représentation. The basic compressed sound représentation may consist of monaural transport signais representing either prédominant sound signais or coefficient sequences of the original HOA représentation.

The basic side information is needed to décodé the basic compressed sound représentation and may be assumed to be of a much smaller size compared to the basic compressed sound représentation. It may be made up to Its greatest part of disjoint portions, each of which spécifiés the décompression of only one particular component of the basic compressed sound représentation. The basic side information may comprise of a first part that may be known as independent basic side information and a second part that may be known as additional basic side information.

Both the first and second parts, the independent basic side Information and the additional basic side Information, may specify the décompression of particular components of the basic compressed sound représentation. The second part is optional and may be omitted. In this case, the compressed sound représentation may be said to comprise the first part (e.g., basic side information).

The first part (e.g., basic side information) may contain side information describing individual (complementary) components of the basic compressed sound représentation

-1218600 independently of other (complementary) components. In particular, the first part (e.g., basic side information) may speclfy decoding of one or more of the plurality of components indivîdually, independently of other components. Thus, the first part may be referred to as independent basic side information.

The second (optional) part may contain side Information, also known as additional basic side Information, may describe individual (complementary) components of the basic compressed Sound représentation in dependence to other (complementary) components. This second part may also be referred to as dépendent basic side information. In particular, the dependence may hâve the following properties:

- The dépendent basic side information for each individual (complementary) component of the basic compressed sound représentation may attain its greatest extent when there are no other certain (complementary) components are contained In the basic compressed sound représentation.

- In case that additional certain (complementary) components are added to the basic compressed sound représentation, the dépendent basic side information for the considered individual (complementary) component may become a subset of the original dépendent basic side information, thereby reducing its size.

The enhancement side information is also optional. It may be used to improve or enhance (e.g., parametrically Improve or enhance) the basic compressed sound représentation. Its size may also be assumed to be much smaller than that of the basic compressed sound représentation,

Thus, In embodiments the compressed sound représentation may comprise a basic compressed sound représentation comprising a plurality of components, basic side information for decoding (e.g., decompressing) the basic compressed sound représentation to a basic reconstructed sound représentation of the sound or sound field, and enhancement side Information including parameters for improving or enhancing (e.g., parametrically improving or enhancing) the basic reconstructed sound représentation. The compressed sound représentation may further comprise additional basic side information for decoding (e.g., decompressing) the basic compressed sound représentation to the basic reconstructed sound représentation, which may include information that spécifiés decoding of one or more of the plurality of components in dependence on respective other components.

One example of such a type of complété compressed sound représentation is given by the compressed Higher Order Ambisonics (HOA) sound field représentation as specified by the prelimlnary version of the MPEG-H 3D audio standard (Référencé 1), Chapter 12 and Annex C.

5. That is, the compressed Sound représentation may correspond to a compressed HOA sound (or Sound field) représentation of a sound or sound field.

For this example, the basic compressed sound field représentation (basic compressed sound représentation) may comprise (e.g., may be Identified with) a number of components.

-1318600

The components may be (e.g., correspond to) monaural signais. The monaural signais may be quantized monaural signais. The monaural signais may represent either prédominant Sound signais or coefficient sequences of an ambient HOA sound field component.

The basic side information may describe, amongst others, for each of these monaural signais how it spatially contributes to the sound field. For Instance, the basic side information may specify a prédominant sound signal as a purely directional signal, meaning a general plane wave with a certain direction of Incidence. Altematively, the basic side information may specify a monaural signal as a coefficient sequence of the original HOA représentation having a certain index. The basic side information may be further separated Into a first part and a second part, as indicated above.

The first part is side information (e.g., independent basic side information) related to spécifie individual monaural signais. This independent basic side information Is independent of the existence of other monaural signais. Such side information may for instance specify a monaural signal to represent a directional signal (e.g., meaning a general plane wave) with a certain direction of incidence. Altematively, a monaural signal may be specified as a coefficient sequence of the original HOA représentation having a certain index. The first part may be referred to as independent basic side information. In general, the first part (e.g., basic side information) may specify decoding of one or more of the plurality of monaural signais individually, independently of other monaural signais.

The second part is side information (e.g., additional basic side information) related to spécifie individual monaural signais. This side information Is dépendent on the existence of other monaural signais. Such side information may be utilized, for example, if monaural signais are specified to be vector based signais (see, e.g., Reference 1, Section 12.4.2.4.4). These signais are directionally distributed within the sound field, where the directional distribution may be specified by means of a vector. In a certain mode (see, e.g., CodedWecLength =1), particular components of this vector are impiicitly set to zéro and are not part of the compressed vector représentation. These components are those with indices equal to those of coefficient sequences of the original HOA représentation and part of the basic compressed sound représentation. That means that if individual components of the vector are coded, their total number may dépend on the basic compressed sound représentation. In particular, the total number may dépend on which coefficient sequences the original HOA représentation contains.

If no coefficient sequences of the original HOA représentation are contained in the basic compressed sound représentation, the dépendent basic side information for each vector-based signal consists of ail the vector components and has its greatest size. In case that coefficient sequences of the original HOA représentation with certain indices are added to the basic compressed sound représentation, the vector components with those indices are removed from the side information for each vector-based signal, thereby reducing the size of the dépendent basic side information for the vector-based signais.

-1418600

The enhancement side Information (e.g., enhancement side information) may comprise parameters related to the (broadband) spatial prédiction (see Reference 1, Section 12.4.2.4.3) and/or parameters related to the Sub-band Directional Signais Synthesis and the Parametric

Ambience Réplication.

The parameters related to the (broadband) spatial prédiction may be used to (linearly) predict missing portions of the sound field from the directional signais.

The Sub-band Directional Signais Synthesis and the Parametric Ambience Réplication are compression tools that were recently introduced into the MPEG-H 3D audio standard with the amendaient [see Reference 2, Section 1]. These two tools allow a frequency-dependent parametric-prediction of additional monaural signais to be spatially distributed in order to complément a spatially Incomplète or déficient compressed HOA représentation. The prédiction may be based on coefficient sequences of the basic compressed sound représentation.

It is important to note that the aforementioned complementary contribution to the sound field is represented within the compressed HOA représentation not by means of additional quantized signais, but rather by means of extra side information of a comparably much smaller size. Hence, the two mentioned coding tools are especially suited for the compression of HOA représentations at low data rates.

A second example of a compressed représentation of one or more monaural signais with the above-mentioned structure may comprise of coded spectral information for disjoint frequency bands up to a certain upper frequency, which can be regarded as a basic compressed représentation; basic side information specifying the coded spectral information (e.g., by the number and width of coded frequency bands); and enhancement side information comprising (e.g., consisting of) parameters of a Spectral Band Réplication (SBR), that describe how to parametrically reconstruct from the basic compressed représentation the spectral information for higher frequency bands which are not considered in the basic compressed représentation.

The présent disclosure proposes a method for the layered coding of a complété compressed sound (or sound field) représentation having the aforementioned structure.

The compression may be frame based in the sense that it provides compressed représentations (in the form of data packets or equivalently frame payloads) for successive time intervals. The time intervals may hâve equal or different sizes. These data packets may be assumed to contain a validity flag, a value indicating their size as well as the actual compressed représentation data. In the following, without intended limitation, it will be assumed that the compression is frame based. Further, unless indicated otherwise and without intended limitation, it will be focused on the treatment of a single frame, and hence the frame index will be omitted.

Each frame payload of the complété compressed sound (or sound field) représentation under considertation is assumed to contain / data packets (or frame payloads), each for one

-1518600 component of a basic compressed sound représentation, which are denoted by BSRC/, j 1,Further, it is assumed to contain a packet with independent basic side information (basic side information) denoted by BSI| specifying particular components BS RC; of the basic compressed sound représentation independently of other components. Optionally, it may additionally be assumed to contain a packet with dépendent basic side information (additional basic side Information) denoted by BSI₀ specifying particular components BS RC; of the basic compressed sound représentation In dependence on other components.

The information contained within the two data packets BSI, and BSI₀ may be optionally grouped into one single data packet BSI of basic side information. The single data packet BSI might be said to contain, amongst others, J portions, each of which specifying one particular component BSRC; of the basic compressed sound représentation. Each of these portions in tum may be said to contain a portion of independent side information and, optionally, a portion of depedent side information.

Eventually, it may include an enhancement side information payload (enhancement side information) denoted by ES! with a description of how to improve or enhance the reconstructed sound (or sound field) from the complété basic compressed sound représentation.

The proposed solution for layered coding addresses required steps to enable both the compression part including the packlng of data packets for transmission as well as the receiver and décompression part. Each part will be described in detail In the following.

First, compression and packing (e.g., for transmission) will be described. In particular, components and éléments of the complété compressed sound (or sound field) représentation in case of layered coding will be described.

Fig. 1 schematically illustrâtes a flowchart of an example of a method for compression and packing (e.g., an encoding method, or a method of layered encoding of a compressed sound représentation of a sound or sound field). The assignment (e.g., allocation) of the individual payloads to the base layer and (M -1) enhancement layers may be accomplished by a transport layers packer. Fig. 2 schematically illustrâtes a block diagram of an example of the assignment/allocation of the Individual payloads.

As indicated above, the complété compressed sound représentation 2100 may relate for example to a compressed HOA représentation comprising a basic compressed sound représentation. The complété compressed sound représentation 2100 may comprise a plurality of components (e.g., monaural signais) 2110-1,... 2110-/, independent basic side information (basic side information) 2120, optional enhancement side information (enhancement side information) 2140, and optional dépendent basic side information (additional basic side information) 2130. The basic side information 2120 may be information for decoding the basic compressed sound représentation to a basic reconstructed sound représentation of the sound or sound field. The basic side information 2120 may include information that spécifiés decoding of one or more components (e.g., monaural signais) Individually, Independently of other

-1618600 components. The enhancement side information 2140 may Include parameters for improving (e.g., enhancing) the basic reconstructed sound représentation. The additional basic side information 2130 may be (further) Information for decoding the baslc compressed sound représentation to the basic reconstructed sound représentation, and may include information that spécifiés decoding of one or more of the plurality of components in dependence on respective other components.

Fig. 2 illustrâtes an underlying assumption where there are a plurality of hierarchical layers, including one base layer (basic layer) and one or more (hierarchical) enhancement layers. For example, there may be M layers in total, i.e. one base layer and Μ -1 enhancement layers. The plurality of hierarchical layers hâve a successively încreasing layer index. The iowest value of the layer index (e.g., layer index 1) corresponds to the base layer. It is further understood that the layers are ordered, from the base layer, through the enhancement layers, up to the overall highest enhancement layer (i.e., the overall highest layer).

The proposed method may be performed on a frame basis (i.e., in a frame-wise manner). In particular, the compressed sound représentation 2100 may be compressed for successive time intervals, for example time intervals of equal size. Each time intervai may correspond with a frame. The steps described below may be performed for each successive time interval (e.g., frame).

At S1010 In Fig. 1, the plurality of components 2110 are sub-divided Into a plurality of groups of components. Each of the plurality of groups is then assigned (e.g., added, or allocated) to a respective one of a plurality of hierarchical layers. Therein, the number of groups corresponds to the number of layers. For example, the number of groups may be equal to the number of layers, so that there is one group of components for each layer. As Indicated above, the plurality of layers may include a base layer and one or more (e.g., M - 1) hierarchical enhancement layers.

in other words, the basic compressed sound représentation is subdivided into parts to be assigned to the individual layers. Without ioss of generality, the grouplng can be described by Μ + 1 numbers J_m. m = 0, ...,M with /₀ = 1 and J_M = J + 1 such that components BSRCÿ is assigned tothe m-th layer for

At S1020. the groups of components are assigned to their respective layers. At S1030. the basic side information 2120 is added (e.g., allocated) to the base layer (i.e., the Iowest one of the plurality of hierarchical layers).

That is, due to its small size it is proposed to include the complété basic side information (basic side information and optional additional baslc side information) to the base layer to avoid its unnecessary fragmentation.

if the compressed sound représentation under considération comprises dépendent baslc side information (additional basic side information), the method may further comprise (not shown in Fig. 1 ) decomposing the additional basic side information into a plurality of portions

-1718600

2130-1.....2130-M of additional basic side information. The portions of additional basic side information may then be added (e.g., allocated) to the base layer. In other words, the portions of additional basic side information may be included In the base layer. Each portion of additional basic side information may correspond to a respective layer and may include information that spécifiés decoding of one or more components assigned to the respective layer In dependence of other components assigned to the respective layer and any layers lower than the respective layer.

Thus, while the independent basic side information BSI_t (basic side information) 2120 is left unchanged for the assignaient, the dépendent basic side information has to be handled specially for layered coding, In order to allow a correct decoding at the receiver side on the one hand, and to reduce the size of the dépendent basic side information to be transmitted on the other hand. It is proposed to décomposé the dépendent basic side information into M parts (portions) denoted by BSI_Dffl, m = 1.....M, where the m-th part contains dépendent basic side information for each of the components BSRC;, of the basic compressed sound représentation assigned to the m-th layer, assuming that the optional dépendent basic side information exists for the compressed sound représentation under considération. In case the respective dépendent side information does not exist, for the compressed sound représentation of parts BSI_Dm may be assumed to be empty. Each part of dépendent basic side information BSI_Dm may be dépendent on ali components BSRCÿ, 1 < / <J_m, contained in ail ofthe layers up to the m-th one, (i.e., contained in ali layers J = 1.....m).

If the independent basic side information packet BSI, is of negligibly small size, it is reasonable to keep is as a whole and add (assign) it to the base layer. Optionally, a similar décomposition as for the dépendent basic side information can also be done for the independent basic side information, providing the packets BSl_lm, m = 1.....M. This is useful to reduce the size of the base layer byadding (assigning) parts ofthe independent basic side information to layers with the corresponding components of the basic compressed sound représentation.

At S1040. a plurality of portions 2140-1,..., 2140-M of enhancement side Information may be determined. Each portion of enhancement side information may include parameters for improving (e.g., enhancing) a reconstructed sound représentation obtainable from data included in the respective layer and any layers lower than the respective layer.

The reason for performing this step is that in the case of layered coding it is important to realize that the enhancement side information has to be computed for each layer extra, since it is intended to enhance the preliminary decompressed sound (or sound field), which however is dépendent on the available layers for décompression. In particular, the preliminary decompressed sound (or sound field) for a given highest decodable layer (highest usable layer) dépends on the components Included In the highest decodable layer and any layers below the highest decodable layer. Hence, the compression has to provide M individual enhancement side

-1818600 information data packets (portions of enhancement side information), denoted by ESI_m, m = 1, where the enhancement side information in the m-th data packet ES!_m is computed such as to enhance the sound (or sound field) représentation obtained from ail data contained in the base layer and enhancement layers with indices lower than m (e.g., ali data contained In the m-th layer and any layers below the m-th layer).

At S1050. the plurality of portions 2140-1.....2140-M of enhancement side information are assigned (e.g., added, or allocated) to the plurality of layers. Each of the plurality of portions of enhancement side information is assigned to a respective one of the plurality of layers. For example, each of the plurality of layers includes a respective portion of enhancement side information.

The assignment of basic and/or enhancement side information to respective iayers may be indicated in configuration information that is generated by the encoding method. in other words, the correspondence between the basic and/or enhancement side information and respective layers may be indicated in the configuration information. Further, the configuration information may Indicate, for each iayer, the components of the basic compressed sound représentation that are assigned to (e.g., included In) that iayer. The portions of additional basic side information are included in the base layer, yet may correspond to layers different from the base layer.

Summing up, at the compression stage a frame data packet, denoted by FRAME, is provided that has the foliowing composition:

FRAME = [BSRCî ... BSRC_; BSI, BSI_D4 ... BSI_D._M ESI_t ... ESI„] (1)

Further, the packets BSI, and BSl_Dim for m = 1,..., M might be combined into a single packet BSI, in which case the frame data packet, denoted by FRAME would hâve the following composition:

FRAME = [BSRC_t BSRC₂ ... BSRC; BSI ESI₂ ESI₂ ... ESI_M ] (2)

The ordering of the individual payloads with the frame data packet may generally be arbitrary.

The individual data packets may then be grouped within payloads, which are defined as speciai data packets that contain a vaiidity flag, a value indicating their size as well as the actual compressed représentation data. The usage of payloads allows a simple de-multiplex at the receiver side, offering the advantage of being able to discard obsolète payloads, without the requirement to parse them through. One possible grouping is given by

- assigning (e.g., allocating) each BSRCj packet, j = 1, ...J, to an individual payload denoted ~BPj.

- assigning (e.g., aliocating) the m-th enhancement side information data packet ESl_m and the m-th dépendent side information data packet BSl_{D m} to one enhancement payload denoted by~ËP_m, m =

- 1918600

- assigning the independent basic side information BSI_f packet to a separate side information payload denoted by BSIP.

Optionally, if the size of the independent basic side information is large, each m-th of its components, BSI_lm, m = may be assigned (e.g., allocated) to the enhancement payload In this case, the side information payload BSIP Is empty and can be Ignored.

Another option is to assign ail dépendent basic side information data packets BSI_Dm into the side information payload BSIP, which is reasonable if the size of the dépendent basic side information is small.

Eventually, a frame data packet, denoted by FRAME, may be provided having the following composition

FRAME - [B/⁵! ... BPj BSÎP ... ËP_M] (3)

The ordering of the individuai payloads with the frame data packet may be generally arbitrary.

The method may further comprise (not shown In Fig. 1) generating, for each of the plurality of layers, a transport layer packet (e.g., a base layer packet 2200 and M-1 enhancement layer packets 2300-1,..., 2300-(M -1)) including the data of the respective layer (e.g., components, basic side information and enhancement side information for the base layer, or components and enhancement side information for the one or more enhancement layers).

The transport layer packets for different layers may have different priorities of transmission. Thus, the method may further comprise (not shown in Fig. 1 ), generating a transport stream for transmission ofthe data ofthe plurality of layers, wherein the base layer has highest priority of transmission and the hierarchical enhancement layers have décrémentai priorities of transmission. Therein, higher priority of transmission may correspond to a greater extent of error protection, and vice versa.

Unless steps requîre certain other steps as prerequisites, the aforementioned steps may be performed in any order and the exemplary order illustrated in Fig. 1 is understood to be nonlimîting.

Fig. 3 illustrâtes a method of decoding a compressed sound représentation ofa sound or sound field) for decoding or décompression (unpacking). Examples of the corresponding 30 receiver and décompression stage are schematicaliy illustrated in the block diagrams of Fig. 4A and Fig. 4B.

As follows from the above, the compressed sound représentation may be encoded in the plurality of hierarchical layers. The plurality of layers may have assigned thereto (e.g., may include) the components of the basic compressed sound représentation, the components being 35 assigned to respective layers In respective groups of components. The base layer may include the basic side information for decoding the basic compressed sound représentation. Each layer may include one of the aforementioned portions of enhancement side information including

-2018600 parameters for improving a basic reconstructed sound représentation obtainable from data included in the respective layer and any layers lower than the respective layer.

The proposed method may be performed on a frame basis (i.e., in a frame-wise manner). In particular, a restored représentation of the sound or sound field may be generated for successive time intervals, for example time intervals of equal size. The time intervals may be frames, for example. The steps described below may be performed for each successive time intervals (e.g., frames).

At S3010, data payloads (e.g., transport layer packets) corresponding to the plurality of layers are received. The data payloads may be received as part of a bitstream that contains the compressed HOA représentation of a sound or a sound field, the représentation corresponding to the plurality of hierarchical layers. The hierarchical layers include a base layer and one or more hierarchical enhancement layers. The plurality of layers hâve assigned thereto components of a basic compressed sound représentation of the sound or sound field. The components are assigned to respective layers in respective groups of components.

The individual layer packets may be multiplexed to provide the received frame packet of the complété compressed sound représentation. The received frame packet may be indicated by [BSI; BSI_d1 ... BSI_DM ESI_t BSRCj ... BSRC^j-i ... ESI_M BSRC_/(M__n ... BSRC_;] (4

In the alternate case of the packets BSI_{ and BSI_DTn for m = 1, ...,M being combined Into a single packet BSI, the individual layer packets may be multiplexed to provide the received frame packet of the complété compressed sound représentation indicated by [BS! ESI_X BSRCi ... BSRC^j-ï ... ESI_M BSRC_/(M__n ... In terms of payloads, the received frame packet may be given by (5) (6)

The received frame packet may then be passed to a decompressor or décoder 4100. If the transmission of an individual layer has been error-free, the vaiidity flag of at least the contained enhancement side information payload ~ËP_m (e.g., corresponding to a portion of enhancement side Information) portion is set to *true. In case of an error due to transmission of an individual layer the vaiidity flag within at least the enhancement side information payload in this layer is set to false. Hence, the vaiidity of a layer packet can be determined from the vaiidity of the contained enhancement side information payload (e.g., from Its vaiidity flag).

In the decompressor 4100, the received frame packet may be de-multiplexed. For this purpose, the information about the size of each payload may be exploited to avoid unnecessary parsing through the data of the individual payloads.

At S3020, a first layer index indicating a highest layer (e.g., highest usable layer, or highest decodable layer) is determined from among the plurality of layers to be used for decoding the basic compressed sound représentation to the basic reconstructed sound représentation of the sound or sound field.

-2! 18600

Moreover, at S3020. there may be selected the value (e.g., layer index) N_B of the hlghest layer (highest usable layer) that will be used for décompression of the basic sound représentation. The highest enhancement layer to be actually used for décompression of the basic sound représentation Is given by N_B - 1. Since each layer contains exactly one enhancement side information payload (portion of enhancement side information), it may be determined based on the enhancement side information payload whether or not the containing layer is valid (e.g., has been validly received). Hence, the sélection can be accomplished using ali enhancement side information payloads ESI_m, m = 1.....M (or correspondingly, ËP_m, m = 1.....M).

At S3030. a basic reconstructed sound représentation is obtained. The basic reconstructed sound représentation may be obtained from components assigned to the highest usable layer indicated by the first layer index and any layers lower than this highest usable layer, using the basic side information (or in general, using the basic side information).

The payloads of the basic compressed sound représentation components BSRCj,....BSRCy may be provided, along with (ali of) the basic side information payloads (e.g., BSI or BSI| and BSI_Dm, m = 1,...,M) and the value N_B, to a Basic Représentation Décompression processing unit 4200. The Basic Représentation Décompression processing unit 4200 (illustrated in Figs, 4A and 4B), reconstructs the basic sound (or sound field) représentation using only those basic compressed sound représentation components contained within the lowest N_B layers, that is the base layer and N_B -1 enhancement layers (I.e., the layers up to the layer indcated by the first layer index). Altematively, only the payloads of the basic compressed sound représentation components contained in the lowest N_B layers together with respective basic side information payloads may be provided to the Basic Représentation Décompression processing unit 4200.

The required information about which components of the basic compressed sound (or sound field) représentation are contained In the individual layers is assumed to be known to the decompressor 4100 from a data packet with configuration information, which is assumed to be sent and received before the frame data packets.

In order to provide the dépendent side information data packets BSI_Dm, m = 1,....n_band the enhancement side information data packet ESI_Ne, ail enhancement payloads may be intput to a partial parser 4400 (see Fig. 4B) of the decompressor 4100 together with the value n_e and the value N_B. The parser may discard ail payloads and data packets that will not be used for actual décompression. If the value of N_E is equal to zéro, ail enhancement side information data packets may be assumed to be empty.

If the base layer includes at least one dépendent basic side information payload (portion of additional basic side information) corresponding to a respective layer, the decoding of each individual dépendent basic side information payload (e.g., BSI_Dm, m = l, (portion of additional basic side information)) may include (i) decoding the portion of additional basic side

-2218600 information by referring to the components assigned to its respective layer and any layers lower than the respective layer (preîiminary decoding), and (ii) correcting the portion of additional basic side information by referring to the components assigned to the highest usable layer and any layers between the highest usable layer and the respective layer (correction). Therein, the additional basic side information corresponding to a respective layer includes information that spécifiés decoding of one or more components among the components assigned to the respective layer in dependence on other components assigned to the respective layer and any layers lower than the respective layer.

Then, the basic reconstructed sound représentation can be obtained (e.g., generated) from the components assigned to the highest usable layer and any layers lower than the highest usable layer, using the basic side information and corrected portions of additional basic side Information obtained from portions of additional basic side Information corresponding to layers up to the highest usable layer.

ln particular, the preîiminary decoding of each payload BSl_DrTl, m = 1,may involve exploiting Ils dependence on the first J_m-1 basic compressed sound représentation components BSRC₁,...,BSRCy_mj_₁ contained in the first m layers, which was assumed at the encoding stage.

The successive correction of each payload BSl_Dm, m = 1,...,/V_B, may involve considering that the basic sound component is flnally reconstructed from the first J_Njs - 1 basic compressed sound représentation components BSRC₁,...,BSRC_c/wb)_₁ contained in the first N_B>m layers, which are more components than assumed for the preîiminary decoding. Hence, the correction may be accomplished by discarding obsolète Information, which is possible due to the initially assumed property of the dépendent basic side information that if certain complementary components are added to the basic compressed sound représentation, the dépendent basic side information for each individual (complementary) component becomes a subset of the original one.

At S3040. a second layer Index may be determined. The second layer Index may indicate the portion(s) of enhancement side information that should be used for improving (e.g., enhancing) the basic reconstructed sound représentation.

In addition to the first layer Index, there may be determined an index (second layer index) N_E of the enhancement side Information payload (portion of second enhancement information) to be used for décompression. The second layer index N_E may always either be equal to the first layer index N_B or equal to zéro. The enhancement may be accomplished either always in accordance to the basic sound représentation obtained from the highest usable layer, or not at ail.

At S3050. a reconstructed sound représentation of the sound or sound field Is obtained (e.g., generated) from the basic reconstructed sound représentation, referring to the second layer Index.

-2318600

That is, the reconstructed sound représentation Is obtained by (parametrically) improving or enhancing the basic reconstructed sound représentation, such as by using the enhancement side information (portion of enhancement side Information) indicated by the second layer index. As indicated further below, the second layer index may Indicate not to use any enhancement side Information at ail at this stage. Then, the reconstructed sound représentation would correspond to the basic reconstructed sound représentation.

For this purpose, the reconstructed basic sound représentation together with ail enhancement side information payloads ESI₁₍ ...,ESI_M1 the basic side information payloads (e.g., BSI or BSI] and BSI_Djffl, m=l, ...,M), and the value N_E is provided to an Enhanced Représentation Décompression processing unit 4300 (illustrated in Figs. 4A and 4B), which computes the final enhanced sound (or sound fïeld) représentation 2100' using only the enhancement side Information payload ESI_We and discarding ail other enhancement side information payloads. Altematively, only the enhancement side information payload ESI_We, instead of ail enhancement side information payloads, may be provided to the Enhanced Représentation Décompression processing unit 4300. lf the value of N_E is equal to zéro, ail enhancement side information payloads are discarded (or altematively, no enhancement side information payload is provided) and the reconstructed final enhanced sound représentation 2100' is equal to the reconstructed basic sound représentation. The enhancement side Information payload ESI_Ne may hâve been optained by the partial parser 4400.

Fig. 3 also generally illustrâtes decoding the compressed HOA représentation based on basic side information that Is associated with the base layer and based on enhancement side information that is associated with the one or more hierarchical enhancement layers.

Unless steps require certain other steps as prerequisites, the aforementioned steps may be performed in any order and the exemplary order illustrated in Fig. 3 is understood to be nonlimiting.

Next, details of the layer sélection for décompression (sélection of the first and second layer Indices) at steps S3020 and S3040 wiil be described.

Determining the first layer index may involve determining, for each layer, whether the respective layer has been validly received. Determining the first layer Index may further involve determining the first layer Index as the layer index of a layer immediately below the lowest layer that has not been validly received. Whether or not a layer has been validly received may be determined by evaluating whether the enhancement side information payload of that layer has been validly received. This In tum may be done by evaluating the validity flags within the enhancement side information payloads.

Determining the second layer index may generally Involve either determining the second layer index to be equal to the first layer index, or determining an Index value as the second layer Index (e.g., index value 0) that indicates not to use any enhancement side information when obtaining the reconstructed sound représentation.

-2418600

In the case that ail frame data packets may be decompressed independently of each other, both the number N_B of the highest layer (hîghest usable layer) to be actually used for décompression of the basic sound représentation and the index /V_E of the enhancement side information payload to be used for décompression may be set to highest number L of a valid enhancement side information payload, which itself may be determined by evaluating the validity flags within the enhancement side information payloads. By exploiting the knowledge of the size of each enhancement side information payload, a compllcated parsing through the actual data of the payloads for the détermination of their validity can be avoided.

That is, the second layer index may be determined to be equal to the first layer index if the compressed sound représentations for the successive time intervals can be decoded independently. In this case, the reconstructed basic sound représentation may be enhanced based on the enhancement side information payload of the highest usable layer.

In case that différentiel décompression with inter-frame dependencies is employed, the decision from the previous frame has to be considered in addition. Note that with differential décompression usually independent frame data packets are transmitted at regular time intervals in order to allow starting the décompression from these time instants, where the détermination of the values N_B and W_E becomes frame independent and is carried out as described above.

To explain the proposed frame dépendent decision in detail, the highest number (e.g., layer index) of a valid enhancement side information payload for a fc-th frame is denoted by by L(fc), the highest layer number (e.g., layer index) to be selected and used for décompression of the basic sound représentation by N_B(k), and the number (e.g., layer index) of the enhancement side information payload to be used for décompression by N_E(k).

Using this notation, the highest layer number to be used for décompression of the basic sound représentation by N_B(k) may be computed according to

W_B(k) = min(fl_B(k — l),L(k)). (7)

By choosing W_B(k) not be greater than W_B(k-l) and L(k) it is ensured that ail information required for differential décompression of the basic sound représentation is available.

That is, if the compressed sound représentations for the successive time intervals (e.g., frames) cannot be decoded Independently of each other, determining the first layer Index may comprise determining, for each layer, whether the respective layer has been validly received, and determining the first layer index for the given time interval as the smaller one of the first layer index of the time interval preceding the given time interval and the layer Index of a layer Immediately below the lowest layer that has not been validiy received.

The number W_E(fc) of the enhancement side information payload to be used for décompression may be determined according to

N_em = {^NM if *«*:) =/Mk-l) _(B) iq else ' '

-2518600

Therein, the choice of 0 for N_E(k) indicates that the reconstructed basic sound représentation is not to be improved or enhaneed using enhancement side information.

This means in particular that as long as the highest layer number N_B(Jc) to be used for décompression of the basic sound représentation does not change, the same corresponding enhancement layer number is selected. However, in case of a change of N_B(k), the enhancement is disabled by setting N_E(Jc) to zéro. Due to the assumed differential décompression of the enhancement side information, its change according to /V_B(fc) is not possible since it would require the décompression of the corresponding enhancement side information layer at the previous frame which Is assumed to not hâve been carried out.

That is, if the compressed sound représentations for the successive time intervals (e.g., frames) cannot be decoded independently of each other, determining the second layer index may comprise determining whether the first layer index for the given time interval is equal to the first layer index for the preceding time interval. If the first layer index for the given time interval is equal to the first layer index for the preceding time interval, the second layer index for the given time interval may be determined (e.g., selected) to be equal to the first layer index for the given time interval. On the other hand, if the first layer index for the given time interval Is not equal to the first layer index for the preceding time interval, an index value may be determined (e.g., selected) as the second layer index that indicates not to use any enhancement side information when obtaining the reconstructed sound représentation.

Altematively, if at décompression ail of the enhancement side information payloads with numbers up to N_E(Jc) are decompressed in parallel, the sélection rule in Equation (4) can be replaced by

N_E(k) = N_B(k). (9)

Finally note that for differential décompression the number of the highest used layer N_Bcan only increase at independent frame data packets, whereas a decrease is possible at every frame.

It is understood that the proposed method of layered encoding of a compressed sound représentation may be implemented by an encoder for layered encoding of a compressed sound représentation. Such encoder may comprise respective units adapted to carry out respective steps described above. An example of such encoder 5000 is schematically illustrated in Fig. 5. For instance, such encoder 5000 may comprise a component sub-dividing unit 5010 adapted to perform aforementioned S1010, a component assignment unit 5020 adapted to perform aforementioned S1020, a basic side information assignment unit 5030 adapted to perform aforementioned S1030, an enhancement side information partitioning unit 5040 adapted to perform aforementioned S1040, and an enhancement side information assignment unit 5050 adapted to perform aforementioned S1050. It is further understood that the respective units of such encoder may be embodied by a processor 5100 of a computing device that is adapted to perform the processing carried out by each of said respective units, i.e. that is

-2618600 adapted to carry out some or ail of the aforementioned steps, as well as any further steps of the proposed encoding method. The encoder or computing device may further comprise a memory

5200 that is accessible by the processor 5100.

It is further understood that the proposed method of decodlng a compressed sound représentation that is encoded In a plurality of hierarchical layers may be Implemented by a décoder for decoding a compressed sound représentation that is encoded in a plurality of hierarchical layers. Such décoder may comprise respective units adapted to carry out respective steps described above. An example of such décoder 6000 Is schematically illustrated In Fig. 6. For instance, such décoder 6000 may comprise a réception unit 6010 adapted to perform aforementioned S3010, a first layer index détermination unit 6020 adapted to perform aforementioned S3020, a basic reconstruction unit 6030 adapted to perform aforementioned S3030, a second layer index détermination unit 6040 adapted to perform aforementioned S3040, and an enhanced reconstruction unit 6050 adapted to perform aforementioned S3050. It is further understood that the respective units of such décoder may be embodied by a processor 6100 of a computing device that is adapted to perform the processing carried out by each of said respective units, I.e. that is adapted to carry out some or ail of the aforementioned steps, as well as any further steps of the proposed decoding method. The décoder or computing device may further comprise a memory 6200 that is accessible by the processor 6100.

It should be noted that the description and drawings merely illustrate the principles of the proposed methods and apparatus. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embodythe principles ofthe invention and are included within its spiritand scope. Furthermore, ail examples recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the proposed methods and apparatus and the concepts contributed by the Inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, ail statements herein reciting principles, aspects, and embodiments of the Invention, as well as spécifie examples thereof, are intended to encompass équivalents thereof.

The methods and apparatus described in the présent document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application spécifie integrated circuits. The signais encountered in the described methods and apparatus may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks orwireline networks, e.g. the Internet.

Reference 1: ISO/IEC JTC1/SC29/WG11 23008-3:2015(E). Information technologyHigh efficiency coding and media delivery In heterogeneous environments - Part 3: 3D audio, February 2015.

-2718600

Référence 2: ISO/IEC JTC1/SC29/WG11 23008-3:2015/PDAM3. Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3:

3D audio, AMENDMENT 3: MPEG-H 3D Audio Phase 2, July 2015.

Claims

1. A method of decoding a compressed Higher Order Ambisonics (HOA) représentation of a sound or sound field, the method comprising:

receiving a bit stream containing the compressed HOA représentation corresponding to a plurality of hierarchical layers that Include a base layer and two or more hierarchlcal enhancement layers, and containing basic side information that is assocîated with the base layer and enhancement side information that is assocîated with the two or more hierarchical enhancement layers, wherein the plurality of layers hâve assigned thereto components of a basic compressed sound représentation ofthe sound or sound fieid, the components being assigned to respective layers in respective groups of components, wherein the two or more hierarchical enhancement layers comprises a highest usable hierarchical enhancement layer, and wherein each of the two or more hierarchical enhancement layers Includes a portion ofthe enhancement side Information including parametersforimproving a basic reconstructed sound représentation obtainable from data included in the respective layer and any layers lower than the respective layer; and decoding the compressed HOA représentation based on the basic side information that is assocîated with the base layer, based on the portion of the enhancement side information that is assocîated with the highest usable hierarchical enhancement layer, and not based on the portion of the enhancement side information that is assocîated with any other layer of the two or more hierarchical enhancement layers.

2. The method of claim 1, wherein the components of the basic compressed sound représentation correspond to monaural signais; and the monaural signais represent either prédominant sound signais or coefficient sequences of an HOA représentation.

3. The method of any of daims 1-2, wherein the bit stream includes data payloads respectively corresponding to the one or more of hierarchical layers.

4. The method of any of daims 1-3, wherein the enhancement side information includes parameters related to at least one of: spatial prédiction, sub-band directional signais synthesis, and parametric ambience réplication.

-2918600

5. The method of any of claims 1-4, wherein the enhancement side Information Includes Information that allows prédiction of missing portions of the sound or sound field from directional signais.

6. The method of any of claims 1-5, further comprising:

determining, for each layer, whether the respective layer has been validly received; and determining a layer Index of a layer immediately below a Iowest layer that has not been validiy received.

7. The method of claim 6, further comprising determining a further layer index that is either equal to the layer index or that indicates omission of enhancement side information during decoding.

8. The method of any one of claims 1-7, wherein the base layer includes at least one portion of additional basic side information corresponding to a respective layer and including Information that spécifiés decoding of one or more components among the components assigned to the respective layer in dependence on other components assigned to the respective layer and any layers lower than the respective layer, the method comprising, for each portion of additional basic side information:

decoding the portion of additional basic side information by referring to the components assigned to its respective layer and any layers lower than the respective layer; and correcting the portion of additional baslc side information by referring to the components assigned to the highest usable hierarchical enhancement layer and any layers between the highest usable hierarchical enhancement layer and the respective layer, wherein the basic reconstructed sound représentation is obtained from the components assigned to the highest usable hierarchical enhancement layer and any layers lower than the highest usable hierarchical enhancement layer, using the basic side information and corrected portions of additional basic side Information obtained from portions of additional basic side information corresponding to layers up to the highest usable hierarchical enhancement layer.

9. An apparatus for decoding a compressed Higher Order Ambisonlcs (HOA) représentation of a sound or sound field, the apparatus comprising:

a receiver for receiving a bit stream containing the compressed HOA représentation corresponding to a plurality of hierarchical layers that include a base layer and two or more hierarchical enhancement layers, and containing basic side information that is associated with the base layer and enhancement side information that is associated with the two or more hierarchical enhancement layers,

-3018600 wherein the plurality of layers hâve assignée! thereto components of a basic compressed sound représentation ofthe sound orsound field, the components being assïgned to respective layers In respective groups of components, wherein the two or more hierarchical enhancement layers comprises a highest usable hierarchical enhancement layer, and wherein each of the two or more hierarchical enhancement layers Includes a portion of the enhancement side information including parameters for Improving a basic reconstructed sound représentation obtainable from data Included ln the respective layers and any layers lower than the respective layer; and a décoder for decoding the compressed HOA représentation based on the basic side Information that is associated with the base layer, based on the portion of the enhancement side information that is associated with the highest usable hierarchical enhancement layer, and not based on the portion of the enhancement side information that is associated with any other layer of the two or more hierarchical enhancement layers.

10. The apparatus of ciaim 9, wherein the components of the basic compressed sound représentation correspond to monaural signais; and the monaural signais represent either prédominant sound signais or coefficient sequences ofan HOA représentation.

11. The apparatus of any of daims 9-10, wherein the bit stream includes data payloads respectively corresponding to the one or more of hierarchical layers.

12. The apparatus of any of claïms 9-11, wherein the enhancement side information includes parameters related to at least one of: spatial prédiction, sub-band directional signais synthesis, and parametric ambience réplication.

13. The apparatus of any of daims 9-12, wherein the enhancement side Information Includes Information that allows prédiction of missing portions of the sound or sound field from directional signais.

14. The apparatus of any of daims 9-13, configured to:

détermine, for each layer, whether the respective layer has been validly received; and détermine a layer Index of a layer Immediately below a lowest layer that has not been validly received.

-31 18600

15. The apparatus of claim 14, further configured to détermine a further layer index that

Is either equal to the layer Index or that indicates omission of enhancement side information during decoding.

5

16. The apparatus of any one of claims 9-15, wherein the base layer includes at least one portion of additional basic side information corresponding to a respective layer and including information that spécifiés decoding of one or more components among the components assigned to the respective layer in dependence on other components assigned to the respective layer and any layers lower than the respective 10 layer, and wherein for each portion of additional basic side Information, the apparatus is configured to;

décodé the portion of additional basic side information by referring to the components assigned to its respective layer and any layers lower than the respective layer; and

15 correct the portion of additional basic side information by referring to the components assigned to the highest usable hierarchical enhancement layer and any layers between the highest usable hierarchical enhancement layer and the respective layer, wherein the basic reconstructed sound représentation is obtained from the components assigned to the highest usable hierarchical enhancement layer and any layers lower than the 20 highest usable hierarchical enhancement layer, using the basic side information and corrected portions of additional basic side information obtained from portions of additional basic side information corresponding to layers up to the highest usable hierarchical enhancement layer.

17. A non-transitory computer readable medium comprising computer interprétable

25 instructions which, when executed by one or more processors of a computing device, cause the computing device to perform the method of any one of claims 1 to 8.