WO2019243670A1

WO2019243670A1 - Determination of spatial audio parameter encoding and associated decoding

Info

Publication number: WO2019243670A1
Application number: PCT/FI2019/050476
Authority: WO
Inventors: Adriana Vasilache
Original assignee: Nokia Technologies Oy
Priority date: 2018-06-21
Filing date: 2019-06-19
Publication date: 2019-12-26
Also published as: GB201810221D0; GB2574873A

Abstract

An apparatus comprising means for receiving a frame of values for sub-bands (201), the values representing at least one of an azimuth index (205), elevation index (205) or energy ratio (110); separating the values into arrays of data values (301, 303, 305), each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding (313, 323, 333) the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

Description

DETERMINATION OF SPATIAL AUDIO PARAMETER ENCODING AND

ASSOCIATED DECODING

Field

The present application relates to apparatus and methods for sound-field related parameter encoding, but not exclusively for time-frequency domain direction related parameter encoding for an audio encoder and decoder.

Background

Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.

The directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.

A parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as coherence, spread coherence, number of directions, distance etc) for an audio codec. For example, these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata. The stereo signal could be encoded, for example, with an AAC encoder. A decoder can decode the audio signals into PCM signals, and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.

The aforementioned solution is particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand- alone microphone arrays). However, it may be desirable for such an encoder to have also other input types than microphone-array captured signals, for example, loudspeaker signals, audio object signals, or Ambisonic signals.

Analysing first-order Ambisonics (FOA) inputs for spatial metadata extraction has been thoroughly documented in scientific literature related to Directional Audio Coding (DirAC) and Harmonic planewave expansion (Harpex). This is since there exist microphone arrays directly providing a FOA signal (more accurately: its variant, the B-format signal), and analysing such an input has thus been a point of study in the field.

A further input for the encoder is also multi-channel loudspeaker input, such as 5.1 or 7.1 channel surround inputs.

However with respect to the directional components of the metadata, which may comprise an elevation, azimuth (and energy ratio which is 1 -diffuseness) of a resulting direction, for each considered time/frequency subband. Quantization of these directional components is a current research topic.

Summary

There is provided according to a first aspect an apparatus comprising means for: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

The base integer expansion values may be decimal values. The means for arithmetic encoding the second array using a number of symbols based on the associated first array value may be further for: arithmetic encoding the second array using a reduced number of symbols based on the associated first array value being the expected maximum first array value; arithmetic encoding the second array using a maximum available number of symbols based on the associated first array value being other than the expected maximum first array value.

The reduced number of symbols may be based on an expected maximum second array value when the associated first array value being the expected maximum first array value.

The means for may be further for: arithmetic encoding the third array using the maximum available number of symbols.

The means for arithmetic encoding the second array using a number of symbols based on the associated first array value may be further for: comparing a value within the first array to a preceding value in the first array; arithmetic encoding the second array using a number of symbols based on the associated first array value based on the associated first array value being different to the preceding first array value; and entropy encoding the second array value based on the associated first array value being the same as the preceding first array value.

The means for arithmetic encoding the second array using a number of symbols based on the associated first array value may be further for: comparing a value within the second array to a preceding value in the second array, wherein the means for arithmetic encoding the second array using a number may be further for: arithmetic encoding the second array using a number of symbols based on the value of the second array being different to the preceding value in the second array; and generating a indicator indicating a skipping of the arithmetic encoding based on the value of the second array being the same as the preceding value in the second array.

The means for arithmetic encoding the third array using a maximum available number of symbols may be further for: comparing a value within the third array to the preceding value in the third array: arithmetic encoding the third array using the maximum available number of symbols based on the value of the third array being different to the preceding value in the third array; and generating a indicator indicating the skipping of the arithmetic encoding based on the value of the third array being the same as the preceding value in the third array.

The means for may be further for generating a indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array.

A maximum available number of symbols may be determined based on the value base.

The maximum available number of symbols may be determined based on the value base is 10 when the value is a base 10 number.

According to a second aspect there is provided an apparatus comprising means for: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

The means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may be further for: arithmetic decoding the second array using a reduced number of symbols based on the associated first array decoded value being the expected maximum first array value; and arithmetic decoding the second array using a maximum available number of symbols based on the associated first array decoded value being other than the expected maximum first array value.

The reduced number of symbols may be based on an expected maximum second array value when the associated first array decoded value being the expected maximum first array value. The means for may be further for arithmetic decoding the third array using a maximum available number of symbols.

The means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may be further for: comparing a decoded value within the first array to a preceding decoded value in the first array, wherein the means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may be further for: arithmetic decoding the second array using a number of symbols based on the associated first array decoded value being different to the preceding first array decoded value; and entropy decoding the second array value based on the associated first array decoded value being the same as the preceding first array decoded value.

The means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may be further for: receiving an indicator indicating the skipping of the arithmetic encoding of the second array; arithmetic decoding the second array value using a number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copying a preceding decoded value in the second array to be a decoded value in the second array based on a value of the indicator indicating a skipping of the arithmetic encoding of the second array.

The means for arithmetic decoding the third array using a maximum available number of symbols may be further for: receiving an indicator indicating the skipping of the arithmetic encoding of the third array; arithmetic decoding the second array value using a maximum available number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copying a preceding decoded value in the third array to be a decoded value in the third array based on a value of the indicator indicating a skipping of the arithmetic encoding of the third array.

The means for may be further for: receiving an indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array; and decoding the at least one of the first array, the second array and the third array based on the indicator.

According to a third aspect there is provided a method comprising: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

The base integer expansion values may be decimal values.

Arithmetic encoding the second array using a number of symbols based on the associated first array value may further comprise: arithmetic encoding the second array using a reduced number of symbols based on the associated first array value being the expected maximum first array value; arithmetic encoding the second array using a maximum available number of symbols based on the associated first array value being other than the expected maximum first array value.

The method may further comprise: arithmetic encoding the third array using the maximum available number of symbols.

Arithmetic encoding the second array using a number of symbols based on the associated first array value may further comprise: comparing a value within the first array to a preceding value in the first array; arithmetic encoding the second array using a number of symbols based on the associated first array value based on the associated first array value being different to the preceding first array value; and entropy encoding the second array value based on the associated first array value being the same as the preceding first array value.

Arithmetic encoding the second array using a number of symbols based on the associated first array value may further comprise: comparing a value within the second array to a preceding value in the second array, wherein arithmetic encoding the second array using a number may further comprise: arithmetic encoding the second array using a number of symbols based on the value of the second array being different to the preceding value in the second array; and generating a indicator indicating a skipping of the arithmetic encoding based on the value of the second array being the same as the preceding value in the second array.

Arithmetic encoding the third array using a maximum available number of symbols may further comprise: comparing a value within the third array to the preceding value in the third array: arithmetic encoding the third array using the maximum available number of symbols based on the value of the third array being different to the preceding value in the third array; and generating a indicator indicating the skipping of the arithmetic encoding based on the value of the third array being the same as the preceding value in the third array.

The method may further comprise generating a indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array.

According to a fourth aspect there is provided a method comprising: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

Arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may further comprise: arithmetic decoding the second array using a reduced number of symbols based on the associated first array decoded value being the expected maximum first array value; and arithmetic decoding the second array using a maximum available number of symbols based on the associated first array decoded value being other than the expected maximum first array value.

The reduced number of symbols may be based on an expected maximum second array value when the associated first array decoded value being the expected maximum first array value.

The method may further comprise arithmetic decoding the third array using a maximum available number of symbols.

Arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may further comprise: comparing a decoded value within the first array to a preceding decoded value in the first array, wherein arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may further comprise: arithmetic decoding the second array using a number of symbols based on the associated first array decoded value being different to the preceding first array decoded value; and entropy decoding the second array value based on the associated first array decoded value being the same as the preceding first array decoded value.

Arithmetic decoding the second array using a number of symbols based on the associated first array decoded value may further comprise: receiving an indicator indicating the skipping of the arithmetic encoding of the second array; arithmetic decoding the second array value using a number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copying a preceding decoded value in the second array to be a decoded value in the second array based on a value of the indicator indicating a skipping of the arithmetic encoding of the second array.

Arithmetic decoding the third array using a maximum available number of symbols may further comprise: receiving an indicator indicating the skipping of the arithmetic encoding of the third array; arithmetic decoding the second array value using a maximum available number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copying a preceding decoded value in the third array to be a decoded value in the third array based on a value of the indicator indicating a skipping of the arithmetic encoding of the third array.

The method may further comprise: receiving an indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array; and decoding the at least one of the first array, the second array and the third array based on the indicator.

According to a fifth aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive a frame of values for sub- bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separate the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encode the first array using a number of symbols based on an expected maximum first array value; and arithmetic encode the second array using a number of symbols based on an associated first array value.

The base integer expansion values may be decimal values. The apparatus caused to arithmetic encode the second array using a number of symbols based on the associated first array value may further be caused to: arithmetic encode the second array using a reduced number of symbols based on the associated first array value being the expected maximum first array value; arithmetic encode the second array using a maximum available number of symbols based on the associated first array value being other than the expected maximum first array value.

The apparatus may further be caused to: arithmetic encode the third array using the maximum available number of symbols.

The apparatus caused to arithmetic encode the second array using a number of symbols based on the associated first array value may further be caused to: compare a value within the first array to a preceding value in the first array; arithmetic encode the second array using a number of symbols based on the associated first array value based on the associated first array value being different to the preceding first array value; and entropy encode the second array value based on the associated first array value being the same as the preceding first array value.

The apparatus caused to arithmetic encode the second array using a number of symbols based on the associated first array value may further be caused to: compare a value within the second array to a preceding value in the second array, wherein the apparatus caused to arithmetic encode the second array using a number may further be caused to: arithmetic encode the second array using a number of symbols based on the value of the second array being different to the preceding value in the second array; and generate an indicator indicating a skipping of the arithmetic encoding based on the value of the second array being the same as the preceding value in the second array.

The apparatus caused to arithmetic encode the third array using a maximum available number of symbols may further be caused to: compare a value within the third array to the preceding value in the third array: arithmetic encode the third array using the maximum available number of symbols based on the value of the third array being different to the preceding value in the third array; and generate an indicator indicating the skipping of the arithmetic encoding based on the value of the third array being the same as the preceding value in the third array.

The apparatus may be further caused to generate an indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array.

According to a sixth aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive a frame of values for sub- bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separate the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decode the first array using a number of symbols based on an expected maximum first array value; and arithmetic decode the second array using a number of symbols based on an associated first array decoded value.

The apparatus caused to arithmetic decode the second array using a number of symbols based on the associated first array decoded value may further be caused to: arithmetic decode the second array using a reduced number of symbols based on the associated first array decoded value being the expected maximum first array value; and arithmetic decode the second array using a maximum available number of symbols based on the associated first array decoded value being other than the expected maximum first array value. The reduced number of symbols may be based on an expected maximum second array value when the associated first array decoded value being the expected maximum first array value.

The apparatus may be further caused to arithmetic decode the third array using a maximum available number of symbols.

The apparatus caused to arithmetic decode the second array using a number of symbols based on the associated first array decoded value may be further caused to: compare a decoded value within the first array to a preceding decoded value in the first array, wherein the apparatus caused to arithmetic decode the second array using a number of symbols based on the associated first array decoded value may further be caused to: arithmetic decode the second array using a number of symbols based on the associated first array decoded value being different to the preceding first array decoded value; and entropy decode the second array value based on the associated first array decoded value being the same as the preceding first array decoded value.

The apparatus caused to arithmetic decode the second array using a number of symbols based on the associated first array decoded value may further be caused to: receive an indicator indicating the skipping of the arithmetic encoding of the second array; arithmetic decode the second array value using a number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copy a preceding decoded value in the second array to be a decoded value in the second array based on a value of the indicator indicating a skipping of the arithmetic encoding of the second array.

The apparatus caused to arithmetic decode the third array using a maximum available number of symbols may further be caused to: receive an indicator indicating the skipping of the arithmetic encoding of the third array; arithmetic decode the second array value using a maximum available number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and copy a preceding decoded value in the third array to be a decoded value in the third array based on a value of the indicator indicating a skipping of the arithmetic encoding of the third array. The apparatus may be further caused to: receive an indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array; and decoding the at least one of the first array, the second array and the third array based on the indicator.

According to a seventh aspect there is provided an apparatus comprising: means for receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; means for separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; means for arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and means for arithmetic encoding the second array using a number of symbols based on an associated first array value.

According to an eighth aspect there is provided an apparatus comprising means for receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; means for separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; means for arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and means for arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

According to a ninth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

According to a tenth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

According to an eleventh aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

According to a twelfth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

According to a thirteenth aspect there is provided an apparatus comprising: receiving circuitry configured to receive a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; encoding circuitry configured to arithmetic encode the first array using a number of symbols based on an expected maximum first array value; and the encoding circuitry configured to arithmetic encode the second array using a number of symbols based on an associated first array value.

According to a fourteenth aspect there is provided an apparatus comprising: receiving circuitry configured to receive a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; decoding circuitry configured to arithmetic decode the first array using a number of symbols based on an expected maximum first array value; and the decoding circuitry configured to further arithmetic decode the second array using a number of symbols based on an associated first array decoded value.

According to a fifteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic encoding the second array using a number of symbols based on an associated first array value.

According to a sixteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio; separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values; arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

An apparatus comprising means for performing the actions of the method as described above.

An apparatus configured to perform the actions of the method as described above.

A computer program comprising program instructions for causing a computer to perform the method as described above.

A computer program product stored on a medium may cause an apparatus to perform the method as described herein.

An electronic device may comprise apparatus as described herein.

A chipset may comprise apparatus as described herein.

Embodiments of the present application aim to address problems associated with the state of the art. Summary of the Figures

For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:

Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments;

Figure 2 shows schematically the analysis processor as shown in figure 1 according to some embodiments;

Figure 3a shows schematically an azimuth encoder according to some embodiments;

Figure 3b shows schematically an elevation encoder according to some embodiments;

Figure 3c shows schematically a ratio encoder according to some embodiments;

Figures 4a to 4c show flow diagrams of the operation of the azimuth encoder according to some embodiments;

Figures 5a to 5c show flow diagrams of the operation of the elevation encoder according to some embodiments;

Figures 6a to 6c show flow diagrams of the operation of the ratio encoder according to some embodiments;

Figure 7 shows a flow diagram of an operation of a decoder suitable for decoding encoded azimuth, elevation or ratio parameters encoded by the azimuth encoder, elevation encoder or ratio encoder respectively; and

Figure 8 shows schematically an example device suitable for implementing the apparatus shown.

Embodiments of the Application

The following describes in further detail suitable apparatus and possible mechanisms for the provision of effective spatial analysis derived metadata parameters. In the following discussions multi-channel system is discussed with respect to a multi-channel microphone implementation. However as discussed above the input format may be any suitable input format, such as multi-channel loudspeaker, ambisonic (FOA/HOA) etc. It is understood that in some embodiments the channel location is based on a location of the microphone or is a virtual location or direction. Furthermore the output of the example system is a multi-channel loudspeaker arrangement. However it is understood that the output may be rendered to the user via means other than loudspeakers. Furthermore the multi- channel loudspeaker signals may be generalised to be two or more playback audio signals.

The metadata consists at least of elevation, azimuth and the energy ratio of a resulting direction, for each considered time/frequency subband. The direction parameter components, the azimuth and the elevation are extracted from the audio data and then quantized to a given quantization resolution. The resulting indexes must be further compressed for efficient transmission. For high bitrate, high quality lossless encoding of the metadata is needed.

The concept as discussed hereafter is to treat the digits of the indexes as symbols and encode them while keeping track of the time-variable statistics and constraints. This may for example be implemented in some embodiments as discussed herein in further detail by encoding the digits using a controlled adaptive arithmetic coder.

With respect to Figure 1 an example apparatus and system for implementing embodiments of the application are shown. The system 100 is shown with an ‘analysis’ part 121 and a‘synthesis’ part 131 . The‘analysis’ part 121 is the part from receiving the multi-channel loudspeaker signals up to an encoding of the metadata and downmix signal and the‘synthesis’ part 131 is the part from a decoding of the encoded metadata and downmix signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).

The input to the system 100 and the‘analysis’ part 121 is the multi-channel signals 102. In the following examples a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments. The multi-channel signals are passed to a downmixer 103 and to an analysis processor 105.

In some embodiments the downmixer 103 is configured to receive the multi- channel signals and downmix the signals to a determined number of channels and output the downmix signals 104. For example the downmixer 103 may be configured to generate a 2 audio channel downmix of the multi-channel signals. The determined number of channels may be any suitable number of channels. In some embodiments the downmixer 103 is optional and the multi-channel signals are passed unprocessed to an encoder 107 in the same manner as the downmix signal are in this example.

In some embodiments the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 associated with the multi-channel signals and thus associated with the downmix signals 104. The analysis processor 105 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108 and an energy ratio parameter 1 10 (and in some embodiments a coherence parameter, and a diffuseness parameter). The direction and energy ratio may in some embodiments be considered to be spatial audio parameters. In other words the spatial audio parameters comprise parameters which aim to characterize the sound-field created by the multi-channel signals (or two or more playback audio signals in general).

In some embodiments the parameters generated may differ from frequency band to frequency band. Thus for example in band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted. A practical example of this may be that for some frequency bands such as the highest band some of the parameters are not required for perceptual reasons. The downmix signals 104 and the metadata 106 may be passed to an encoder 107.

The encoder 107 may comprise an audio encoder core 109 which is configured to receive the downmix (or otherwise) signals 104 and generate a suitable encoding of these audio signals. The encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs. The encoding may be implemented using any suitable scheme. The encoder 107 may furthermore comprise a metadata encoder/quantizer 1 1 1 which is configured to receive the metadata and output an encoded or compressed form of the information. In some embodiments the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line. The multiplexing may be implemented using any suitable scheme.

In the decoder side, the received or retrieved data (stream) may be received by a decoder/demultiplexer 133. The decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a downmix extractor 135 which is configured to decode the audio signals to obtain the downmix signals. Similarly the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata. The decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.

The decoded metadata and downmix audio signals may be passed to a synthesis processor 139.

The system 100‘synthesis’ part 131 further shows a synthesis processor 139 configured to receive the downmix and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.

Therefore in summary first the system (analysis part) is configured to receive multi-channel audio signals. Then the system (analysis part) is configured to generate a downmix or otherwise generate a suitable transport audio signal (for example by selecting some of the audio signal channels).

The system is then configured to encode for storage/transmission the downmix (or more generally the transport) signal and.

After this the system may store/transmit the encoded downmix and metadata.

The system may retrieve/receive the encoded downmix and metadata.

Then the system is configured to extract the downmix and metadata from encoded downmix and metadata parameters, for example demultiplex and decode the encoded downmix and metadata parameters.

The system (synthesis part) is configured to synthesize an output multi- channel audio signal based on extracted downmix of multi-channel audio signals and metadata.

With respect to Figure 2 an example analysis processor 105 and Metadata encoder/quantizer 1 1 1 (as shown in Figure 1 ) according to some embodiments is described in further detail.

The analysis processor 105 in some embodiments comprises a time- frequency domain transformer 201 .

In some embodiments the time-frequency domain transformer 201 is configured to receive the multi-channel signals 102 and apply a suitable time to frequency domain transform such as a Short Time Fourier Transform (STFT) in order to convert the input time domain signals into a suitable time-frequency signals. These time-frequency signals may be passed to a direction analyser 203 and to a signal analyser 205.

Thus for example the time-frequency signals 202 may be represented in the time-frequency domain representation by

S_i(b, n),

where b is the frequency bin index and n is the frame index and i is the channel index. In another expression, n can be considered as a time index with a lower sampling rate than that of the original time-domain signals. These frequency bins can be grouped into subbands that group one or more of the bins into a subband of a band index k = 0,..., K-1 . Each subband k has a lowest bin b_k,low and a highest bin b_k,high, and the subband contains all bins from b_k,low to b_k,high. The widths of the subbands can approximate any suitable distribution. For example the Equivalent rectangular bandwidth (ERB) scale or the Bark scale.

In some embodiments the analysis processor 105 comprises a direction analyser 203. The direction analyser 203 may be configured to receive the time- frequency signals 202 and based on these signals estimate direction parameters 108. The direction parameters may be determined based on any audio based ‘direction’ determination.

For example in some embodiments the direction analyser 203 is configured to estimate the direction with two or more signal inputs. This represents the simplest configuration to estimate a‘direction’, more complex processing may be performed with even more signals.

The direction analyser 203 may thus be configured to provide an azimuth for each frequency band and temporal frame, denoted as azimuth f(k,n) and elevation 0(k,n). The direction parameter 108 may be also be passed to a signal analyser 205. The direction analyser 203 may also be configured to determine an energy ratio parameter 1 10. The energy ratio may be considered to be a determination of the energy of the audio signal which can be considered to arrive from a direction. The direct-to-total energy ratio r(k,n) can be estimated, e.g., using a stability measure of the directional estimate, or using any correlation measure, or any other suitable method to obtain a ratio parameter.

The estimated direction 108 and energy ratio 1 10 parameters may be output (and passed to an encoder 107).

Therefore in summary the analysis processor is configured to receive time domain multichannel or other format such as microphone or ambisonic audio signals.

Following this the analysis processor may apply a time domain to frequency domain transform (e.g. STFT) to generate suitable time-frequency domain signals for analysis and then apply direction analysis to determine direction and energy ratio parameters. The analysis processor may then be configured to output the determined parameters.

Although directions and ratios are here expressed for each time index n, in some embodiments the parameters may be combined over several time indices. Same applies for the frequency axis, as has been expressed, the direction of several frequency bins b could be expressed by one direction parameter in band k consisting of several frequency bins b. The same applies for all of the discussed spatial parameters herein.

As also shown in Figure 2 an example metadata encoder/quantizer 1 1 1 is shown according to some embodiments. The metadata encoder/quantizer 1 1 1 may comprise a direction index generator 205. The direction index generator 205 is configured to receive the direction parameters (such as the azimuth (p(k, n) and elevation 0(k, n) 108 and generate a quantized output. In some embodiments the quantization is based on an arrangement of spheres forming a spherical grid arranged in rings on a‘surface’ sphere. In other words the spherical grid uses the idea of covering a sphere with smaller spheres and considering the centres of the smaller spheres as points defining a grid of almost equidistant directions. The smaller spheres therefore define cones or solid angles about the centre point which can be indexed according to any suitable indexing algorithm. Although spherical quantization is described here any suitable quantization, linear or non-linear may be used.

The direction index generator 205 having defined this indexing may then be configured to receive the elevation-azimuth values and output elevation and azimuth direction indices.

In some embodiments the direction index generator 205 is further configured to receive the energy ratio and apply a quantization to the energy ratio and mapping the energy ratio to an index based on the quantization applied.

The quantized direction index values and the energy ratios (which may also be quantized as index values may then be passed to the encoder 207.

In the following examples the audio metadata indices comprise of azimuth, elevation, and energy ratio data for each subband. The azimuth value can be represented on 9 bits (in other words values from 0 to 51 1 or 2⁹-1 ), the elevation on 7 bits (in other words values from 0 to 127 or 2⁷-1 ) and the energy ratio on 8 bits (in other words values from 0 to 255 or 2⁸-1 ). If there are N subbands per frame then (16+8)xN bits will be needed to store the uncompressed directional metadata for each frame.

The encoder 207 having received the direction index values and the energy ratios may then compress or encode the values to attempt to produce a more bit efficient lossless representation of the direction index and energy ratio values. This is achieved as described in further detail by treating the digits of the indexes and values as symbols and encoding them separately and keeping track of the time- variable statistics and constraints. In the following all the arithmetic encoding methods described below are adaptive encoding. The proposed encoding methods described below may be considered to be digit arithmetic coding (digit AC).

With respect to Figures 3a to 3c is shown encoder example azimuth, elevation and energy ratio index encoders.

With respect to Figure 3a an azimuth encoder 301 is shown. The azimuth encoder 301 is shown receiving the azimuth index output values and generating encoded symbols representing the azimuth index in an efficient manner according to some embodiments.

The azimuth encoder 301 may comprise a frame divider 31 1 . The frame divider is configured to receive the index value for the azimuth value, a value from 000 to 51 1 and split it according to a base integer expansion. A base integer expansion of a number may be expressed as the form:

Value = (where k is the base expansion, b_k the value of the

expansion for k and X the base).

In other words where the azimuth values are represented by a base 10 number the expansion is represented by:

Azimuth Value = where k=2,1 ,0

the frame divider can be configured to split the azimuth value into a first value, data1 , representing the b₂(10²) value (or most significant base integer expansion) and which has the value from 0 to 5, a second value, data2, representing the bi(10¹) value (or next significant base integer expansion) and which has the value range from 0 to 9, and a third value, data3, representing the b₀(10⁰) value (or least significant base integer expansion) and which has the value range from 0 to 9. In the following examples the values are represented by base 10 values. In such a system the values are decimal expansion based values and the maximum available number of symbols representing each value is 10. In some embodiments where the values are represented by base systems other than 10 similar methods may be applied where the maximum available number of symbols is determined based on the base selected or chosen to represent the values. For example where the values are represented by a hexadecimal base then the integer expansion is to split the values into integers representing the 16⁰, 16¹, ...expansions.

The data values may then be passed to an array of arithmetic coders. Therefore the data1 values can in some embodiments be passed to arithmetic coder 1 313₁, data2 values can in some embodiments be passed to arithmetic coder 2 313₂, and data3 values can in some embodiments be passed to arithmetic coder 3

313₃. Each of the arithmetic coders is configured to encode the values based on the number of symbols.

Furthermore in some embodiments the arithmetic coder 2 313₂ is also configured to receive the data1 value and encode the data2 value based on the value of the data1 value.

Additionally in some embodiments the azimuth encoder comprises a memory/comparator 315. The memory/comparator 315 is configured to receive the current azimuth index value and compare it to the previous frame azimuth index value for the same sub-band. The output of the comparison may be passed and used to control the behaviour of the arithmetic coders 313₁ , 313₂, 313₃.

With respect to Figures 4a to 4c are shown flow diagrams describing the operations of the azimuth encoder according to some embodiments.

With respect to Figure 4a is shown a first example operation of the azimuth encoder. The initial operation is receiving the integer index of the azimuth data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 4a by step 401 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 4a by step 403.

The next operation is to use arithmetic coding for 6 symbols (0:5) to encode the data1 as shown in Figure 4a by step 405.

The following operation is to determine whether the corresponding datal value is the maximum value (5). Where the data value is not the maximum value (in other words a value from 0 to 4) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 5) then use an arithmetic encoder with only two symbols (because the index value can only be either a 0 or 1 ). The operation of checking the value of data1 and encoding the corresponding data2 value using a different number of symbols based on the value of data1 is shown in Figure 4a by step 407.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 4a by step 409.

With respect to Figure 4b is shown a second example operation of the azimuth encoder.

The initial operation is receiving the integer index of the azimuth data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 4b by step 401 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 4b by step 403.

The next operation is to use arithmetic coding for 6 symbols (0:5) to encode the data1 as shown in Figure 4b by step 405.

The following operation is to determine whether the corresponding datal value is the maximum value (5). Where the data value is not the maximum value (in other words a value from 0 to 4) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 5) then use an arithmetic encoder with only two symbols (because the index value can only be either a 0 or 1 ). Furthermore a comparison is made between the current sub-band value of data1 against the previous sub-band value of data 1 , in other words data1 [i]==data1 [i-1 ], where the two values are the same then the current sub- band data2 value data2[i] is encoded with a Golomb Rice code of order 0. The operation of checking the value of data1 , and comparing it to the previous sub-band value and encoding the corresponding data2 value based on the value of data1 and the comparison is shown in Figure 4b by step 417.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 4b by step 419.

With respect to Figure 4c is shown a third example operation of the azimuth encoder.

The initial operation is receiving the integer index of the azimuth data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 4c by step 401 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 4c by step 403.

The next operation is to use arithmetic coding for 6 symbols (0:5) to encode the data1 as shown in Figure 4c by step 405. The operation which follows is one which determines whether a sub-band index value is to be coded or to be skipped. In some embodiments if the azimuth index value at position“i” is equal to the previous azimuth index value the azimuth index value at position“i” is marked to be skipped from coding. This may in some embodiments be implemented by generating an array skip_data[N] signaling if the data2 and data3 are to be coded or not. In other words the array indicates for a position“i” whether it is equal to the previous index or not. If the current value of data1 is equal to the previous one and the current index is equal to the previous one, a bit“0” is added to the bitstream, signaling that the corresponding data2 and data3 values are not needed to be encoded. If the current value of data1 is equal to the previous one and the current index is not equal to the previous one, one bit“1” is added to the bitstream signaling that the corresponding data2 and data3 components are to be encoded.

The operation of determining whether to skip the index value from coding or not is shown in Figure 4c by step 427.

Where the index value is to be coded then the following operation is to determine whether the corresponding datal value is the maximum value (5). Where the data value is not the maximum value (in other words a value from 0 to 4) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 5) then use an arithmetic encoder with only two symbols (because the index value can only be either a 0 or 1 ). The operation of checking the value of data1 and encoding the corresponding data2 value based on the value of data1 and the comparison is shown in Figure 4c by step 429.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 4c by step 431 .

The method as shown in Figure 4c may for example be implemented in c code as

In the examples further embodiments may be determined by performing the encoding methods as shown in Figures 4b and 4c and selecting the one which produces the fewer number of bits. In such embodiments one bit is used per frame to signal which method is chosen.

With respect to Figure 3b an elevation encoder 320 is shown. The elevation encoder 320 is shown receiving the elevation index output values and generating encoded symbols representing the elevation index in an efficient manner according to some embodiments.

The elevation encoder 320 may comprise a frame divider 321 . The frame divider is configured to receive the index value for the elevation value, a value from 000 to 127 and split it into a first value, data1 , representing the b₂(10²) value and which has the range from 0 to 1 , a second value, data2, representing the b₁(10¹) value and which has the range from 0 to 9, and a third value, data3, representing the b₀(10⁰) value and which has the range from 0 to 9.

The data values may then be passed to an array of arithmetic coders. Therefore the data1 values can in some embodiments be passed to arithmetic coder 1 323₁, data2 values can in some embodiments be passed to arithmetic coder 2

323₂, and data3 values can in some embodiments be passed to arithmetic coder 3

323₃. Each of the arithmetic coders is configured to encode the values based on the number of symbols. Furthermore in some embodiments the arithmetic coder 2 323₂ is also configured to receive the data1 value and encode the data2 value based on the value of the data1 value.

Additionally in some embodiments the elevation encoder comprises a memory/comparator 325. The memory/comparator 325 is configured to receive the current elevation index value and compare it to the previous frame elevation index value for the same sub-band. The output of the comparison may be passed and used to control the behaviour of the arithmetic coders 323₁, 323₂, 323₃.

With respect to Figures 5a to 5c are shown flow diagrams describing the operations of the azimuth encoder according to some embodiments.

With respect to Figure 5a is shown a first example operation of the elevation encoder.

The initial operation is receiving the integer index of the elevation data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 5a by step 501 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 5a by step 503.

The next operation is to use arithmetic coding for 2 symbols (0:1 ) to encode the data1 as shown in Figure 5a by step 505.

The following operation is to determine whether the corresponding datal value is the maximum value (1 ). Where the data value is not the maximum value (in other words a value 0) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 1 ) then use an arithmetic encoder with only three symbols (because the index value can only be either a 0, 1 or 2). The operation of checking the value of data1 and encoding the corresponding data2 value using a different number of symbols based on the value of data1 is shown in Figure 5a by step 507. The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 5a by step 509.

With respect to Figure 5b is shown a second example operation of the elevation encoder.

The initial operation is receiving the integer index of the elevation data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 5b by step 501 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 5b by step 503.

The next operation is to use arithmetic coding for 2 symbols (0:1 ) to encode the data1 as shown in Figure 5b by step 505.

The following operation is to determine whether the corresponding datal value is the maximum value (1 ). Where the data value is not the maximum value (in other words a value 0) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 1 ) then use an arithmetic encoder with only three symbols (because the index value can only be either a 0, 1 or 2). Furthermore a comparison is made between the current sub-band value of data1 against the previous sub-band value of data 1 , in other words data1 [i]==data1 [i-1 ], where the two values are the same then the current sub- band data2 value data2[i] is encoded with a Golomb Rice code of order 0. The operation of checking the value of data1 , and comparing it to the previous sub-band value and encoding the corresponding data2 value based on the value of data1 and the comparison is shown in Figure 5b by step 517.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 5b by step 519.

With respect to Figure 5c is shown a third example operation of the elevation encoder. The initial operation is receiving the integer index of the elevation data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 5c by step 501 .

The next operation is to group the first digits of all the indexes per frame in the array datal, group the second digits of all the indexes per frame are grouped together in the array data2, and group the third digits of all the indexes per frame are grouped together in the array data3. This operation of dividing each sub-band index value into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 5c by step 503.

The next operation is to use arithmetic coding for 2 symbols (0:1 ) to encode the data1 as shown in Figure 5c by step 505.

The operation which follows is one which determines whether a sub-band index value is to be coded or to be skipped. In some embodiments if the azimuth index value at position“i” is equal to the previous azimuth index value the azimuth index value at position“i” is marked to be skipped from coding. This may in some embodiments be implemented by generating an array skip_data[N] signaling if the data2 and data3 are to be coded or not. In other words the array indicates for a position“i” whether it is equal to the previous index or not. If the current value of data1 is equal to the previous one and the current index is equal to the previous one, a bit“0” is added to the bitstream, signaling that the corresponding data2 and data3 values are not needed to be encoded. If the current value of data1 is equal to the previous one and the current index is not equal to the previous one, one bit“1” is added to the bitstream signaling that the corresponding data2 and data3 components are to be encoded.

The operation of determining whether to skip the index value from coding or not is shown in Figure 5c by step 527.

Where the index value is to be coded then the following operation is to determine whether the corresponding datal value is the maximum value (1 ). Where the data value is not the maximum value (in other words a value 0) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 1 ) then use an arithmetic encoder with only three symbols (because the index value can only be either a 0, 1 or 2). The operation of checking the value of data1 and encoding the corresponding data2 value based on the value of data1 and the comparison is shown in Figure 5c by step 529.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 5c by step 531 .

In the examples further embodiments may be determined by performing the encoding methods as shown in Figures 5a and 5c and selecting the one which produces the fewer number of bits. In such embodiments one bit is used per frame to signal which method is chosen.

With respect to Figure 3c an energy ratio encoder 330 is shown. The energy ratio encoder 330 is shown receiving the energy ratio output values and generating encoded symbols representing the energy ratio in an efficient manner according to some embodiments.

The energy ratio encoder 330 may comprise a frame divider 331 . The frame divider is configured to receive the energy ratio value, a value from 000 to 255 in some embodiments and split it into a first value, data1 , representing the b₂(10²) value and which has the range from 0 to 2, a second value, data2, representing the b₁(10¹) value and which has the range from 0 to 9, and a third value, data3, representing the b₀(10⁰) value and which has the range from 0 to 9.

The data values may then be passed to an array of arithmetic coders. Therefore the data1 values can in some embodiments be passed to arithmetic coder 1 333₁, data2 values can in some embodiments be passed to arithmetic coder 2

333₂, and data3 values can in some embodiments be passed to arithmetic coder 3

333₃. Each of the arithmetic coders is configured to encode the values based on the number of symbols.

Furthermore in some embodiments the arithmetic coder 2 333₂ is also configured to receive the data1 value and encode the data2 value based on the value of the data1 value.

With respect to Figure 6 is shown a flow diagram describing the operations of the energy ratio encoder according to some embodiments. The initial operation is receiving the energy ratio data represented as an array of three digits (each digit having a value between 0 and 9) as shown in Figure 6 by step 601 .

The next operation is to group the first digits of all the values per frame in the array datal, group the second digits of all the values per frame are grouped together in the array data2, and group the third digits of all the values per frame are grouped together in the array data3. This operation of dividing each sub-band values into the digits (and then grouping the digits into data1 , data2, and data3 arrays) is shown in Figure 6 by step 603.

The next operation is to use arithmetic coding for three symbols (0:2) to encode the data1 as shown in Figure 6 by step 605.

The following operation is to determine whether the corresponding datal value is the maximum value (2). Where the data value is not the maximum value (in other words a value 0 or 1 ) then use arithmetic coding for 10 symbols (0:9) to encode the data2. However where the value is the maximum value (in other words 2) then use an arithmetic encoder with only six symbols (because the index value can only be either a 0, 1 , 2, 3, 4 or 5). The operation of checking the value of data1 and encoding the corresponding data2 value using a different number of symbols based on the value of data1 is shown in Figure 6 by step 607.

The next operation is to use arithmetic coding for 10 symbols (0:9) to encode the data3 value as shown in Figure 6 by step 609.

Although the above examples has been described with respect to the encoding of the index and values with 2⁹ 2⁷ and 2⁸ bits it would be understood that the number of values may change and associated with this the maximum value for the data1 value and similarly the number of symbols used to encode the data2 value.

The decoding of the bitstream within the metadata extractor 137 may be implemented by performing the reverse operations to that of the metadata encoder 1 1 1 .

Thus in some embodiments the metadata extractor 137 comprises a first decoder which restores the encoded data1 , data2, and data3 values to azimuth, elevation and ratio values. The azimuth and elevation index values may then be passed to an index to elevation (or index-azimuth) converter. The index to elevation (azimuth) converter is configured to receive the index and quantization map information and generate an approximate or quantized elevation (or azimuth) output. In some embodiments

The energy ratio values may also in some embodiments be dequantized or mapped back to the 0 to 1 range.

An example of the decoder configured to decode the encoded azimuth values to generate a decoded azimuth index in response to the encoding shown with respect to Figure 4a may be as follows:

Decode arithmetic encoded data for data1

For i=1 :N

If data1 [i] == max symbol (for example 5)

Decode data2[i] with arithmetic coding having the restricted number of symbols (for example 0 or 1 )

Else

Decode data2[i] with arithmetic coding with 10 symbols End

End for

Arithmetic coding based decode data3

With respect to Figure 7 is shown a flow diagram of the decoding of the example encoded bit stream where a selection of the methods shown in Figures 4b and 4c is used to encode the azimuth index is shown.

Thus for example a first operation would be to read the indicator bit to select between the method shown in Figure 4b (method B) and Figure 4c (method C) as shown in Figure 7 by step 701 .

Where the determination indicates that the encoding was based on Method C then decode data1 using an arithmetic encoding as shown in Figure 7 by step 703

Then the method checks whether the symbol has been skipped from being encoded. If the check is positive that the symbol has been skipped then data2 is the same as the previous index data2 and data3 is the same as the previous index data3 as shown in Figure 7 by step 705. Where the check is negative and the symbol was not skipped then the next step is one of checking whether the data1 was the max value (for example 5). If the data1 value is the maximum value then decode data2 using the arithmetic decoder and using the reduced number of symbols (for example 2). However if the data1 is not the max value then decode data2 using an arithmetic coder with the usual range of symbols (10 symbols). Furthermore decode data3 using an arithmetic coder with the usual range of symbols (10 symbols). The operation of decoding the data2 and data3 values when they have been encoded is shown in Figure 7 by step 707.

Where the determination indicates that the encoding was based on Method B then decode data1 using an arithmetic encoding as shown in Figure 7 by step 713

Then the method checks whether data1 is equal to the previous data1 value. If this correct and data1 is equal to the previous data1 value then decode data2 using a Golomb-Rice code of order 0. Else then check whether the data1 was the max value (for example 5). If the data1 value is the maximum value then decode data2 using the arithmetic decoder and using the reduced number of symbols (for example 2). However if the data1 is not the max value then decode data2 using an arithmetic coder with the usual range of symbols (10 symbols). Furthermore decode data3 using an arithmetic coder with the usual range of symbols (10 symbols). The operation of decoding the data2 and data3 values when they have been encoded is shown in Figure 7 by step 715.

The“max symbol” of data1 and the value of restricted number of symbols depends on the number of bits used (if 9 bits, then 51 1 is max value of index, therefore“max symbol is“5” and restricted number of symbols for second digit is “2”; if 8 bits, then 255 is max value of index, therefore“max symbol is“2” and restricted number of symbols for second digit is“6”; if 7 bits, then 127 is max value of index, therefore“max symbol is“1” and restricted number of symbols for second digit is“3”.

With these encoding/decoding techniques the compression ratio for a test data is as shown in the following table. As reference it is compared with the results based on the zero order entropy of indexes.

From the table the Huffman encoding methods gives good results, but its dictionary is trained on the data and therefore the method is very sensitive to angle deviations in the direction estimation.

Alternatively, the digit AC method makes no assumption about the data distribution. The best result for the azimuth encoding is obtained combining the Huffman encoding and the digit AC methods.

In some embodiments the examples as shown in the methods shown above can be modified. For example the energy ratio, which may have a more consistent distribution may be encoded using a Huffman coding method whereas the azimuth and elevation encoding using the methods described above.

With respect to Figure 8 an example electronic device which may be used as the analysis or synthesis device is shown. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.

In some embodiments the device 1400 comprises at least one processor or central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.

In some embodiments the device 1400 comprises a memory 141 1 . In some embodiments the at least one processor 1407 is coupled to the memory 141 1 . The memory 141 1 can be any suitable storage means. In some embodiments the memory 141 1 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore in some embodiments the memory 141 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.

In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating with the position determiner as described herein.

In some embodiments the device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.

The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).

The transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.

In some embodiments the device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar.

In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.

The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.

Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS:

1 . An apparatus comprising means for:

receiving a frame of values for sub-bands, the values representing at least one of an azimuth index, elevation index or energy ratio;

separating the values into arrays of data values, each array representing base integer expansion values, such that a first array represents the most significant base integer values, and a second and further arrays represents the lesser significant base integer values;

arithmetic encoding the first array using a number of symbols based on an expected maximum first array value; and

arithmetic encoding the second array using a number of symbols based on an associated first array value.

2. The apparatus as claimed in claim 1 , wherein the means for arithmetic encoding the second array using a number of symbols based on the associated first array value is further for:

arithmetic encoding the second array using a reduced number of symbols based on the associated first array value being the expected maximum first array value;

arithmetic encoding the second array using a maximum available number of symbols based on the associated first array value being other than the expected maximum first array value.

3. The apparatus as claimed in claim 2, wherein the reduced number of symbols is based on an expected maximum second array value when the associated first array value being the expected maximum first array value.

4. The apparatus as claimed in any of claims 1 to 3, wherein the means for is further for: arithmetic encoding the third array using the maximum available number of symbols.

5. The apparatus as claimed in any of claims 1 to 4, wherein the means for arithmetic encoding the second array using a number of symbols based on the associated first array value is further for:

comparing a value within the first array to a preceding value in the first array; arithmetic encoding the second array using a number of symbols based on the associated first array value based on the associated first array value being different to the preceding first array value; and

entropy encoding the second array value based on the associated first array value being the same as the preceding first array value.

6. The apparatus as claimed in any of claims 1 to 4, wherein the means for arithmetic encoding the second array using a number of symbols based on the associated first array value is further for:

comparing a value within the second array to a preceding value in the second array, wherein the means for arithmetic encoding the second array using a number is further for:

arithmetic encoding the second array using a number of symbols based on the value of the second array being different to the preceding value in the second array; and

generating a indicator indicating a skipping of the arithmetic encoding based on the value of the second array being the same as the preceding value in the second array.

7. The apparatus as claimed in any claim dependent on claim 4, wherein the means for arithmetic encoding the third array using a maximum available number of symbols is further for:

comparing a value within the third array to the preceding value in the third array: arithmetic encoding the third array using 10 symbols based on the value of the third array being different to the preceding value in the third array; and

generating a indicator indicating the skipping of the arithmetic encoding based on the value of the third array being the same as the preceding value in the third array.

8. The apparatus as claimed in any of claims 1 to 7, wherein the means for is further for generating a indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array.

9. The apparatus as claimed in any claim dependent on claim 2, wherein a maximum available number of symbols is determined based on the value base.

10. The apparatus as claimed in claim 9, wherein the maximum available number of symbols is determined based on the value base is 10 when the value is a base

10 number.

1 1 . An apparatus comprising means for:

arithmetic decoding the first array using a number of symbols based on an expected maximum first array value; and

arithmetic decoding the second array using a number of symbols based on an associated first array decoded value.

12. The apparatus as claimed in claim 1 1 , wherein the means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value is further for:

arithmetic decoding the second array using a reduced number of symbols based on the associated first array decoded value being the expected maximum first array value; and

arithmetic decoding the second array using a maximum available number of symbols based on the associated first array decoded value being other than the expected maximum first array value.

13. The apparatus as claimed in claim 12, wherein the reduced number of symbols is based on an expected maximum second array value when the associated first array decoded value being the expected maximum first array value.

14. The apparatus as claimed in any of claims 1 1 to 13, wherein the means for is further for:

arithmetic decoding the third array using a maximum available number of symbols.

15. The apparatus as claimed in any of claims 1 1 to 14, wherein the means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value is further for:

comparing a decoded value within the first array to a preceding decoded value in the first array, wherein the means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value is further for:

arithmetic decoding the second array using a number of symbols based on the associated first array decoded value being different to the preceding first array decoded value; and

entropy decoding the second array value based on the associated first array decoded value being the same as the preceding first array decoded value.

16. The apparatus as claimed in any of claims 1 1 to 14, wherein the means for arithmetic decoding the second array using a number of symbols based on the associated first array decoded value is further for:

receiving an indicator indicating the skipping of the arithmetic encoding of the second array;

arithmetic decoding the second array value using a number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and

copying a preceding decoded value in the second array to be a decoded value in the second array based on a value of the indicator indicating a skipping of the arithmetic encoding of the second array.

17. The apparatus as claimed in any claim dependent on claim 14, wherein the means for arithmetic decoding the third array using a maximum available number of symbols is further for:

receiving an indicator indicating the skipping of the arithmetic encoding of the third array;

arithmetic decoding the second array value using a maximum available number of symbols based on the value of the indicator indicating the arithmetic encoding of the second array; and

copying a preceding decoded value in the third array to be a decoded value in the third array based on a value of the indicator indicating a skipping of the arithmetic encoding of the third array.

18. The apparatus as claimed in any of claims 1 1 to 17, wherein the means for are further for:

receiving an indicator identifying a type of encoding applied to at least one of the first array, the second array and the third array; and

decoding the at least one of the first array, the second array and the third array based on the indicator.

19. The apparatus as claimed in any claim dependent on claim 12, wherein a maximum available number of symbols is determined based on the value base.

20. The apparatus as claimed in claim 19, wherein the maximum available number of symbols is determined based on the value base is 10 when the value is a base 10 number.