WO2024115052A1 - Parametric spatial audio encoding - Google Patents

Parametric spatial audio encoding Download PDF

Info

Publication number
WO2024115052A1
WO2024115052A1 PCT/EP2023/080907 EP2023080907W WO2024115052A1 WO 2024115052 A1 WO2024115052 A1 WO 2024115052A1 EP 2023080907 W EP2023080907 W EP 2023080907W WO 2024115052 A1 WO2024115052 A1 WO 2024115052A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
audio
ratio
quantized
generating
Prior art date
Application number
PCT/EP2023/080907
Other languages
French (fr)
Inventor
Adriana Vasilache
Mikko-Ville Laitinen
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2024115052A1 publication Critical patent/WO2024115052A1/en

Links

Abstract

An apparatus, for encoding an audio object parameter; the apparatus comprising means for: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.

Description

PARAMETRIC SPATIAL AUDIO ENCODING
Field
The present application relates to apparatus and methods for spatial audio representation and encoding, but not exclusively for audio representation for an audio encoder.
Background
Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
The directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
A parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as surround coherence, spread coherence, number of directions, distance etc) for an audio codec. For example, these parameters can be estimated from microphone-array captured audio signals, and for example a stereo or mono signal can be generated from the microphone array signals to be conveyed with the spatial metadata.
Immersive audio codecs are being implemented supporting a multitude of operating points ranging from a low bit rate operation to transparency. An example of such a codec is the Immersive Voice and Audio Services (IVAS) codec which is being designed to be suitable for use over a communications network such as a 3GPP 4G/5G network including use in such immersive services as for example immersive voice and audio for virtual reality (VR). This audio codec is expected to handle the encoding, decoding and rendering of speech, music and generic audio. It is furthermore expected to support channel-based audio and scene-based audio inputs including spatial information about the sound field and sound sources. The codec is also expected to operate with low latency to enable conversational services as well as support high error robustness under various transmission conditions.
The stereo signal could be encoded, for example, with an AAC encoder and the mono signal could be encoded with an EVS encoder. A decoder can decode the audio signals into PCM signals and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
The aforementioned immersive audio codecs are particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand-alone microphone arrays). However, such an encoder can have other input types, for example, loudspeaker signals, audio object signals, Ambisonic signals.
Summary
According to a first aspect there is provided an apparatus, for encoding an audio object parameter; the apparatus comprising means for: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
The means for generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may be for: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
The means for generating a single number value by appending elements from the vector may be further for transforming the elements from the vector into a base representation based on the first number of bits.
The means for transforming the elements from the vector into a base representation based on the first number of bits, may be for transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
The means for generating the vector from the selection of the quantized ratio parameters may be for generating the vector from the selection of all but one of the quantized ratio parameters.
The means for generating the vector from the selection of all but one of the quantized ratio parameters may be for: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
The means for quantizing the ratio parameter with respect to the audio object using the first number of bits may be for scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
The first number of bits may be three and wherein the integer value may be an integer value in base ten.
The valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven.
According to a second aspect there is provided an apparatus, for decoding ratio parameters for audio objects, the apparatus comprising means for: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
The means for converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may be for: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
The means for regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may be for generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
The means for dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may be for scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
The first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
According to a third aspect there is provided a method for an apparatus, for encoding an audio object parameter; the method comprising: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects. Generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may comprise: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
Generating a single number value by appending elements from the vector may further comprise transforming the elements from the vector into a base representation based on the first number of bits.
Transforming the elements from the vector into a base representation based on the first number of bits, may comprise transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
Generating the vector from the selection of the quantized ratio parameters may comprise generating the vector from the selection of all but one of the quantized ratio parameters.
Generating the vector from the selection of all but one of the quantized ratio parameters may comprise: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
Quantizing the ratio parameter with respect to the audio object using the first number of bits may comprise scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
The first number of bits may be three and wherein the integer value may be an integer value in base ten.
The valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven. According to a fourth aspect there is provided a method for an apparatus for decoding ratio parameters for audio objects, the method comprising: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
Converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may comprise: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
Regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may comprise generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
Dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may comprise scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
The first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
According to a fifth aspect there is provided an apparatus, for encoding an audio object parameter, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
The apparatus caused to perform generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may be further caused to perform: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
The apparatus caused to perform generating a single number value by appending elements from the vector may be further be caused to perform transforming the elements from the vector into a base representation based on the first number of bits.
The apparatus caused to perform transforming the elements from the vector into a base representation based on the first number of bits, may be caused to perform transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
The apparatus caused to perform generating the vector from the selection of the quantized ratio parameters may be further caused to perform generating the vector from the selection of all but one of the quantized ratio parameters.
The apparatus caused to perform generating the vector from the selection of all but one of the quantized ratio parameters may be further caused to perform: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects. The apparatus caused to perform quantizing the ratio parameter with respect to the audio object using the first number of bits may be caused to perform scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
The first number of bits may be three and wherein the integer value may be an integer value in base ten.
The valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven.
According to a sixth aspect there is provided an apparatus, for decoding ratio parameters for audio objects, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
The apparatus caused to perform converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may be caused to perform: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
The apparatus caused to perform regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may be further caused to perform generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
The apparatus caused to perform dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may be caused to further perform scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
The first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
According to a seventh aspect there is provided an apparatus for encoding an audio object parameter; the apparatus comprising: means for obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; means for quantizing the ratio parameters with respect to the audio objects using a first number of bits; means for generating a vector from a selection of the quantized ratio parameters; and means for generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
According to an eighth aspect there is provided an apparatus for decoding ratio parameters for audio objects, the apparatus comprising: means for obtaining an integer value representing ratio parameters for the audio objects; means for converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; means for regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and means for dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
According to a ninth aspect there is provided an apparatus for encoding an audio object parameter, the apparatus comprising: obtaining circuitry configured to obtain a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing circuitry configured to quantize the ratio parameters with respect to the audio objects using a first number of bits; vector generating circuitry for generating a vector from a selection of the quantized ratio parameters; and integer value generating circuitry configured to generate an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
According to a tenth aspect there is provided an apparatus for decoding ratio parameters for audio objects, the apparatus comprising: obtaining circuitry configured to obtain an integer value representing ratio parameters for the audio objects; converting circuitry configured to convert the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating circuitry configured to regenerate at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing circuitry configured to dequantize the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
According to an eleventh aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus for encoding an audio object parameter, the apparatus caused to perform at least the following: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects. According to a twelfth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus for decoding ratio parameters for audio objects, the apparatus caused to perform at least the following: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
According to a thirteenth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus for encoding an audio object parameter, the apparatus caused to perform at least the following: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
According to a fourteenth there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus for decoding ratio parameters for audio objects, the apparatus caused to perform at least the following: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
Summary of the Figures
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments;
Figure 2 shows schematically an example metadata extractor and metadata compressor and packer as shown in the system of apparatus as shown in Figure 1 according to some embodiments;
Figure 3 shows a flow diagram of the operation of the example metadata extractor and metadata compressor and packer shown in Figure 2 according to some embodiments;
Figure 4 shows schematically an example ISM vector index generator as shown in Figure 2 according to some embodiments;
Figure 5 shows a flow diagram of the operation of the example ISM vector index generator as shown in Figures 4 according to some embodiments;
Figure 6 shows schematically an example metadata decoder as shown in the system of apparatus as shown in Figure 1 according to some embodiments; Figure 7 shows a flow diagram of the operation of the example metadata decoder shown in Figure 6 according to some embodiments;
Figure 8 shows schematically a flow diagram of the operation of the example ISM vector index to vector generator as shown in Figure 6 according to some embodiments; and
Figure 9 shows an example device suitable for implementing the apparatus shown in previous figures.
Embodiments of the Application
The following describes in further detail suitable apparatus and possible mechanisms for the encoding of parametric spatial audio signals comprising transport audio signals and spatial metadata. In the following the 3GPP IVAS codec is configured to be receive a combined input format mode. The combined input format mode will enable simultaneous encoding of two different audio input formats. An example of two different audio input formats being currently considered is the combination of the MASA format with audio objects. The audio objects data can also be known as independent stream with metadata (ISM) and is interchangeably described herein. Within the encoding of the combined format, the parameter designated ISM ratio is used to describe the distribution of the ISM related audio content with respect to the objects. Specifically the ISM ratio identifies the distribution of a certain object within the object part of the total audio scene. Furthermore, there may be a parameter called MASA-to-total energy ratio that identifies a portion of MASA stream within the total audio scene (containing the objects and the MASA). Thus, the (1 - MASA-to-total energy ratio) identifies a portion of all the objects within the total audio scene
These parameters are sent to the decoder.
The following concept as discussed in detail herein is the efficient encoding of these ISM radios. These ISM ratios could be indexed within a pyramidal truncation of a Zn lattice or to encoded by a suitable entropy encoder (such as a Golomb Rice coder) or a context arithmetic encoder. The encoding using pyramidal truncation of a Zn lattice is more efficient in terms of compression efficiency, but it needs memory to store index offsets and description of the vectors of the Zn lattice. The arithmetic encoding methods are generally less efficient because there is typically not sufficient data within an audio frame in order to determine the distribution of the index values.
The embodiments as discussed herein attempt to provide an indexing method for the lattice Zn vectors which does not need to store the index offsets nor the information relative to layer values. The embodiments which employ such methods are efficient for lower dimensions of the lattice and can be used for the encoding the ISM ratio index vectors.
Metadata-Assisted Spatial Audio (MASA) is an example of a parametric spatial audio format and representation suitable as an input format for IVAS.
It can be considered an audio representation consisting of ‘N channels + spatial metadata’. It is a scene-based audio format particularly suited for spatial audio capture on practical devices, such as smartphones. The idea is to describe the sound scene in terms of time- and frequency-varying sound source directions and, e.g., energy ratios. Sound energy that is not defined (described) by the directions, is described as diffuse (coming from all directions).
As discussed above spatial metadata associated with the audio signals may comprise multiple parameters (such as multiple directions and associated with each direction (or directional value) a direct-to-total energy ratio, spread coherence, distance, etc.) per time-frequency tile. The spatial metadata may also comprise other parameters or may be associated with other parameters which are considered to be non-directional (such as surround coherence, diffuse-to-total energy ratio, remainder-to-total energy ratio) but when combined with the directional parameters are able to be used to define the characteristics of the audio scene. For example a reasonable design choice which is able to produce a good quality output is one where the spatial metadata comprises one or more directions for each time-frequency subframe (and associated with each direction direct-to- total ratios, spread coherence, distance values etc) are determined.
As described above, parametric spatial metadata representation can use multiple concurrent spatial directions. With MASA, the proposed maximum number of concurrent directions is two. For each concurrent direction, there may be associated parameters such as: Direction index; Direct-to-total ratio; Spread coherence; and Distance. In some embodiments other parameters such as Diffuse- to-total energy ratio; Surround coherence; and Remainder-to-total energy ratio are defined.
To have sufficient frequency and temporal resolution (for example having 5 frequency bands and having 20 milliseconds temporal resolution), in many cases only a few bits can be used per value (e.g., the direction parameter). In practice, this means that the quantization steps are relatively large. Thus, for example, for a certain time-frequency tile the quantization points are at 0, ±45, ±90, ±135, and 180 degrees of azimuth.
The audio objects input format can comprise independent streams with metadata (ISM). In some cases, the metadata may not be available (in which cases, e.g., some default values may be assumed). Within the encoding of the combined format, a parameter named ISM ratio has been defined and which identifies the distribution of a certain object within the object part of the total audio scene. The concept as discussed herein is the efficient encoding and decoding of these ISM ratio parameters.
In this regard Figure 1 depicts an example apparatus 100 and system for implementing embodiments of the application. In this regard Figure 1 depicts an example apparatus and system for implementing embodiments of the application. The system is shown with an ‘analysis’ part. The ‘analysis’ part is the part from receiving the multi-channel signals up to an encoding of the metadata and downmix signal.
The input to the system ‘analysis’ part is the multi-channel audio signals 102. In the following examples a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments. For example, in some embodiments the spatial analyser and the spatial analysis may be implemented external to the encoder. For example, in some embodiments the spatial (MASA) metadata associated with the audio signals may be provided to an encoder as a separate bit-stream. In some embodiments the spatial (MASA) metadata may be provided as a set of spatial (direction) index values.
Additionally, Figure 1 also depicts multiple audio objects 104 as a further input to the analysis part. As mentioned above these multiple audio objects (or audio object stream) 104 may represent various sound sources within a physical space. Each audio object may be characterized by an audio (object) signal and accompanying metadata comprising directional data (in the form of azimuth and elevation values) which indicate the position or direction of the audio object within a physical space on an audio frame basis.
The multi-channel signals 102 are passed to an analyser and encoder 101 , and specifically a transport signal generator 105 and to a metadata generator 103.
In some embodiments the metadata generator 103 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 104 associated with the multi-channel signals and thus associated with the transport signals 106. The analysis processor 103 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter and an energy ratio parameter and a coherence parameter (and in some embodiments a diffuseness parameter). The direction, energy ratio and coherence parameters may in some embodiments be considered to be MASA spatial audio parameters (or MASA metadata). In other words, the spatial audio parameters comprise parameters which aim to characterize the sound-field created/captured by the multi-channel signals (or two or more audio signals in general).
In some embodiments the parameters generated may differ from frequency band to frequency band. Thus, for example in band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted. A practical example of this may be that for some frequency bands such as the highest band some of the parameters are not required for perceptual reasons. The transport signals 106 and the metadata 104 may be passed to a combined encoder core 109.
In some embodiments the transport signal generator 105 is configured to receive the multi-channel signals and generate a suitable transport signal comprising a determined number of channels and output the transport signals 106 (MASA transport audio signals). For example, the transport signal generator 105 may be configured to generate a 2-audio channel downmix of the multi-channel signals. The determined number of channels may be any suitable number of channels. The transport signal generator in some embodiments is configured to otherwise select or combine, for example, by beamforming techniques the input audio signals to the determined number of channels and output these as transport signals.
In some embodiments the transport signal generator 105 is optional and the multi-channel signals are passed unprocessed to a combined encoder core 109 in the same manner as the transport signal are in this example.
The audio objects 104 may be passed to the audio object analyser 107 for processing. In some embodiments the audio object analyser 107 analyses the object audio input stream 104 in order to produce suitable audio object transport signals and audio object metadata. For example, the audio object analyser may be configured to produce the audio object transport signals by downmixing the audio signals of the audio objects into a stereo channel using amplitude panning based on the associated audio object directions. Additionally, the audio object analyser may also be configured to produce the audio object metadata associated with the audio object input stream 104.. The audio object metadata may comprise direction values which are applicable for all sub-bands. So, if there are 4 objects, there are 4 directions. In the examples described herein the direction values also apply across all of the subframes of the frame, but in some embodiments the temporal resolution of the direction values can differ and the directions values apply for one or more than one sub-frames of the frame. Furthermore the audio object metadata may comprise energy ratios (or ISM ratios). In the following examples the energy ratios (or ISM ratios), are for each time-frequency tile for each object.
In some embodiments, the audio object analyser 107 may be sited elsewhere and the audio objects 104 input to the analyser and encoder 101 is audio object transport signals and audio object metadata.
The analyser and encoder 101 may comprise an audio encoder core 109 which is configured to receive the transport audio (for example downmix) signals 106 and audio object transport signals 128 in order to generate a suitable encoding of these audio signals. The audio encoder core 109 is further configured to receive the output of the metadata generator, the MASA metadata 104 and output an encoded or compressed form of the information as Encoded (MASA) metadata 116.
The analyser and encoder 101 may also comprise an audio object metadata encoder 111 which is similarly configured to receive the audio object metadata 108 and output an encoded or compressed form of the input information as encoded audio object metadata 112.
In some embodiments the combined encoder core 109 can be configured to implement a stream separation metadata determiner and encoder which can be configured to determine the relative contributory proportions of the multi-channel signals 102 (MASA audio signals) and audio objects 104 to the overall audio scene. This measure of proportionality produced by the stream separation metadata determiner and encoder may be used to determine the proportion of quantizing and encoding “effort” expended for the input multi-channel signals 102 and the audio objects 104. In other words, the stream separation metadata determiner and encoder may produce a metric which quantifies proportion of the encoding effort expended on the multichannel audio signals 102 compared to the encoding effort expended on the audio objects 104. This metric may be used to drive the encoding of the audio object metadata 108 and the metadata 104. Furthermore, the metric as determined by the separation metadata determiner and encoder may also be used as an influencing factor in the process of encoding the transport audio signals 106 and audio object transport audio signal 128 performed by the combined encoder core 109. The output metric from the stream separation metadata determiner and encoder can furthermore be represented as encoded stream separation metadata and be combined into the encoded metadata stream from the combined encoder core 109.
The analyser and encoder 101 can in some embodiments be a computer or mobile device (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs. The encoding may be implemented using any suitable scheme. In some embodiments the encoder 107 may further interleave, multiplex to a single data stream or embed the encoded MASA metadata, audio object metadata and stream separation metadata within the encoded (downmixed) transport audio signals before transmission or storage shown in Figure 1 by the dashed line. The multiplexing may be implemented using any suitable scheme.
Furthermore with respect to Figure 1 is shown an associated decoder and renderer 119 which is configured to obtain the bitstream 118 comprising encoded metadata 116, encoded transport audio signals 138 and encoded audio object metadata 112 and from these generate suitable spatial audio output signals. The decoding and processing of such audio signals are known in principle and are not discussed in detail hereafter other than the decoding of the encoded ISM ratio metadata.
With respect to Figure 2 is shown in further detail the audio object analyser 107 and audio object metadata encoder 111 according to some embodiments.
In some embodiments the audio object analyser 107 comprises an ISM ratio generator 201 . The ISM ratio generator 201 is configured to generate independent streams with metadata (ISM) ratios associated with the audio object signals 104.
In some embodiments the ISM ratios can be obtained as follows.
First, the object audio signals sobj(t, i) are transformed to time-frequency domain Sobj(b, n, i) (where t is the temporal sample index, b the frequency bin index, n the temporal frame index, and i the object index. The time-frequency domain signals can, e.g., be obtained via short-time Fourier transform (STFT) or complex-modulated quadrature filterbanks (QMF) (or low-delay variants of them).
Then, the energies of the objects are computed in frequency bands
Figure imgf000021_0001
where bk low is the lowest and bk high the highest bin of the frequency band k. Then, the ISM ratios (k, n, i) can be computed as
Figure imgf000021_0002
where I is the number of objects.
In some embodiments, the temporal resolution of the ISM ratios may be different than the temporal resolution of the time-frequency domain audio signals SobJ b, n, i) (i.e., the temporal resolution of the spatial metadata may be different than the temporal resolution of the time-frequency transform). In those cases, the computation (of the energy and/or the ISM ratios) may include summing over multiple temporal frames of the time-frequency domain audio signals and/or the energy values.
The ISM ratios are numbers between 0 and 1 and they correspond to the fraction with which one object is active within the audio scene created by all the objects. For each object there is one ISM ratio per frequency sub-band and time subframe. As discussed above the ISM ratios are passed to the audio object metadata encoder 111.
As discussed above in some embodiments the audio object metadata encoder 111 is configured to encode the ISM ratios. The directions (i.e. , the azimuth and elevation angle per object) in some embodiments are forwarded to the encoder 111 , and encoded.
In some embodiments the audio object metadata encoder 111 comprises an ISM ratio quantizer 203. The ISM ratio quantizer 203 is configured to receive the ISM ratio values 202 and quantize them. In some embodiments for each sub-band and time subframe the indexes can be scalarly quantized on nb=3 bits. As such the quantization of each of the ratios returns a positive integer value in binary from 000 to 111 (or 0 to 7 in decimal or base 10 form). In other embodiments the quantization can be performed using any suitable number of bits. Thus although the following examples show a uniform scalar quantizer based on 3 bits for each value. It can also be a non-uniform scalar quantizer. The distribution of the indexes does not influence the indexing. However, this could in principle be taken into account by observing that some vector indexes are more probable than others. The use of 3 bits for quantizing allows the following example to use base 10 for representing the number. However quantizers based on more than 3 bits can be employed. For example base 16 or hexadecimal representation could be used for 4 bit quantization representation and base 32 used for 5 bits.
The quantized ISM ratio index values 204 can then be passed to an ISM ratio vector generator 205.
In some embodiments the audio object metadata encoder 111 comprises an ISM ratio vector generator 205 which is configured to receive the quantized ISM ratio values and generate a vector representation of the ISM ratios for the subband. The vectors to be indexed are {% G Zn| i i = K}- In other words, they are the lattice Zn vectors of the pyramidal layer of norm K.
In the following examples there are 3 objects and the ISM ratios have been quantized with 3 bits each. As such using the above the example has n=3 and K = 7. The vector produced by the ISM ratio vector generator 205 therefore in this example can be configured to determine vectors (to be indexed in a manner as described later) as follows:
[0 0 7] and all different permutations [0 7 0]; [7 0 0]
[0 1 6] and all permutations [0 6 1 ]; [1 6 0]; [6 1 0]; [6 0 1]; [1 0 6]
[0 2 5] and all permutations [0 5 2]; [2 5 0]; [5 2 0]; [5 0 2]; [2 0 5]
[1 1 5] and all different permutations [1 5 1]; [5 1 1 ]
[0 3 4] and all permutations [0 4 3]; [3 4 0]; [4 3 0]; [4 0 3]; [3 0 4]
[1 2 4] and all permutations [4 2 1 ]; [2 4 1 ]; [1 4 2]; [4 1 2]; [2 1 4]
The Vector Quantized ISM ratio index values 206 can then be passed to the ISM vector index generator 207.
The audio object metadata encoder 111 can in some embodiments comprise an ISM vector index generator 207. The ISM vector index generator 207 can be configured to generate a suitable index value for the sub-band representing the vector and pass this as an encoded ISM ratio index value 208 which can for example be passed to a bitstream generator 209 to be included within the bitstream 118.
With respect to Figure 3 is shown a flow diagram which summarises the operations of the example audio object analyser 107 and example audio object metadata encoder 111 shown in Figure 2.
The initial operation is one of receiving/obtaining the independent streams with metadata as shown in Figure 3 by step 301 .
Then the following operation is performed of generating ISM ratio values from the independent streams with metadata as shown in Figure 3 by step 303.
Having determined the ISM ratio values, they can be quantized to generate quantized ISM ratio values as shown in Figure 3 by step 305.
From the quantized ISM ratios the next operation is generating vectors from quantized ISM ratio values as shown in Figure 3 by step 307.
Then from the vectors an ISM vector index is generated as shown in Figure 3 by step 309. The ISM vector index can then be output for inclusion to the bitstream as shown in Figure 3 by step 311 .
With respect to Figure 4 the ISM vector index generator 207 is shown in further detail.
In this example the ISM vector index generator 207 comprises a vector component selector 401 which is configured to receive the vector quantized ISM ratio index values 206 and select vector components and pass these to a number generator 403.
By definition, for each sub-band and subframe, the sum of the ISM ratios across all objects is 1. As the ISM ratios sum up to 1 , there is a corresponding relationship between the quantization indexes and they should sum up to 2nb-1 (= K = 7 as indicated above). This enables reduction for the number of values or vector components that are required to be encoded and sent.
Thus in some embodiments the vector component selector is configured to select and forward the first N-1 components of a N length vector.
For example in the N=3 example above, where for a sub-band the vector quantized ISM ratio index values 206 are [0 4 3] then the output selected vector components are 0 and 4 (with 3 being not selected). In this example although the first N-1 components are selected it would be understood that any suitable criteria for the selection of the N-1 can be implemented, for example the ‘last N-1’ components selected. The selection or ‘dropping’ of a component can be implemented based on any suitable selection method. For example in some embodiments the first N-1 values are selected (or in other words the selection always ‘drops’ the last value because the entire space of permutations is used). In some embodiments there can be an estimation of a histogram of the resulting indexes and encode the indexes (corresponding to first components) with a variable bitrate.
Furthermore in some embodiments a complexity reduction can be employed by (in the while loops) favouring the lowest value in the beginning.
The ISM vector index generator 207 further comprises a number generator 403 which is configured to receive the output of the vector component selector 401 and generate a number value from the selected vector components. In some embodiments the number generator 403 is configured to generate a base 10 number from concatenating a base 10 representation of each of the selected components. Thus in the example shown above where the selected components are 0 and 4 then the base 10 value of 4 is generated by the number generator 403.
The number generator can then pass the generated number to a number to index generator 405.
In some embodiments the ISM vector index generator 207 comprises a number to index generator 405. The mapping of the number generated by the number generator to a suitable index is not one of taking the number as the index because when the numbers from 0 upwards are considered then not all numbers can be generated by the number generator (as the maximum value for each place in the number is 7 in the K=7 example) and therefore there are some numbers which correspond to valid vectors from the set we want to enumerate and some numbers which will not be generated.
For example an example correspondence in the following table for the N=3 K=7 example:
Figure imgf000025_0001
Figure imgf000026_0002
This way all of the possible 36 vectors of the pyramidal layer of norm 7 from the lattice Z3 can obtain an index.
The same idea can be applied for n=2, for n=4, or higher.
The indexing function can be defined by the following pseudocode implementation:
Input : Vector of n positive integer values, whose sum equals K, y Output: Enumeration index
1 . Take first n-1 vector components of the vector y
2. Form the number:
Figure imgf000026_0001
3. Let i = 0, index = 0
4. While i < x a. If the vector corresponding to i is valid i. Index = index +1 ; b. End if c. i = i+1
5. End while
6. Return index
Or described in a C function implementation by the following int16_t index_slice_enum( int16_t *ratio_ism_idx, /* integer array to be indexed */ int16_t n ) /* space dimension */
{ int16_t i; int16_t x, index; int16_t base; if ( n == 2 )
{ index = ratio_ism_idx[0];
} else { x = ratio_ism_idx[n - 2]; base = 10;
Figure imgf000027_0001
base; base *= 10;
} index = 0; i = 0; while ( i <= x )
{ if ( valid( i, K, n - 1 ) )
{ index++;
} i++;
}
} return index -1 ;
}
The function valid() is verifying if a given number corresponds to a valid (n- 1 )-dimensional array of integers, i.e. having the Laplacian norm less or equal to K. In other words the valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven where the number of iterations for the valid vector loop is seven for two objects (as only one value is checked), seventy for three objects (as one two values are checked) or seven-hundred for four objects (as three values are checked) in the 3 bit quantization example. Thus there are eight (0,1 ,2,3, ... 7) valid vectors for two objects, 36 valid vectors for 3 objects (such as shown in the table above) and 120 valid vectors for 4 objects.
The valid() function is defined as: int16_t valid( int16_t index, int16_t K, int16_t len )
{ int16_t out; int16_t i, sum, elem; int16_t base[4]; /* the maximum space dimension is considered to be 4 */ sum = 0; set_s( base, 1 , len ); /* set all values to 1 */ for ( i = 1 ; i < len; i++ )
{ base[i] = base[i - 1] * 10;
} sum = 0; for ( i = len - 1 ; i >= 0; i- )
{ elem = index I base[i]; sum += elem; index -= elem * base[i];
} if ( sum <= K )
{ out = 1 ;
} else
{ out = 0;
} return out;
}
The encoded ISM ratio index values 208 can then be output.
With respect to Figure 5 is shown a flow diagram which summarises the operations of the example ISM vector index generator 207 shown in Figure 4.
The initial operation is one of receiving/obtaining the Vector Quantized ISM ratio index values as shown in Figure 5 by step 501 .
Then the following operation is performed of selecting vector components (for example selecting all but the last component, or in other words removing the last component) as shown in Figure 5 by step 503.
Having determined the selected vector components, they can be used to generate a single number from selected component values, for example by appending the selected vector component values into a single number, which is a decimal or base 10 representation number, as shown in Figure 5 by step 505. Then the following operation is one of generating an index value from the number as shown in Figure 5 by step 507.
This operation of generating an index value from the number can be shown as the loop of:
Start Loop with loop index at 0, vector index is initialized at 0 as shown in Figure 5 by step 571 ;
Check if vector corresponding to loop index value is valid and if so increment vector index as shown in Figure 5 by step 573; and
Increment loop index, if loop index is the input number then stop and index is vector index value otherwise perform a new check on the incremented loop index value as shown in Figure 5 by step 575.
Then output the ISM vector index values as shown in Figure 5 by step 511 .
With respect to Figure 6 is shown in further detail the decoder as shown in Figure 1 with respect to the decoding and generating of decoded ISM ratio values. Furthermore Figures 7 and 8 show flow diagrams of the example decoder operations as shown in Figure 6 according to some embodiments.
In some embodiments the decoder comprises a bitstream demultiplexer 601 configured to demultiplex the bitstream 118 and extract encoded ISM ratio index values 602. These encoded ISM ratio index values 602 can be passed to a metadata decoder where the decoded ISM ratio values are output and used to generate spatial audio signals.
In other words the method comprises receiving/obtaining the bitstream as shown in Figure 7 by step 701 .
Then the encoded ISM ratio index values are demultiplexed from the bitstream as shown in Figure 7 by step 703.
In some embodiments the metadata decoder 603 comprises an ISM vector index to vector generator 605. The ISM vector index to vector generator 605 is configured to generate decoded ISM vector values 606 from the encoded ISM ratio index values 602.
In some embodiments this can be implemented using the opposite index to vector value determination as discussed above.
Thus as shown in Figure 7 by step 705 there is the operation of generating of ISM vector value from ISM ratio index values. This operation is further described with respect to Figure 8 and the operations of the example ISM vector index to vector generator as shown in Figure 6 where the encoded Vector Quantized ISM ratio index values are received or obtained by step 801 .
Then a loop is started with index value = ISM ratio index value and variable J at 0 as shown in Figure 8 by step 803.
Then within the loop the index value is decremented (by 1 ) if the vector corresponding to J is valid as shown in Figure 8 by step 805.
The value of J is incremented (by 1 ) as shown in Figure 8 by step 807.
Then a check is made on the index value, to determine if it is zero as shown in Figure 8 by step 809. If the value is zero then the loop returns to step 805 else it progresses to step 811 .
Then the digits of J are assigned as the first n-1 components of the ISM vector as shown in Figure 8 by step 811 .
Furthermore the last component is then generated based on the difference between the sum of the n-1 components and the Laplacian norm value K as shown in Figure 8 by step 813.
The deindexing function can furthermore be defined by the following pseudo-code:
Input: index
Output: array of integers with Laplacian norm equal to K
1. j= 0
2. While (index > 0) a. If the vector corresponding to j is valid i. index = index -1 b. End if c. j = j+1
3. End while
4. Assign the digits of j to the first n-1 components of the output array
5. Calculate the value of the last component as K minus the sum of the first (n- 1 ) components.
Which can further be represented by the corresponding C language function: static void decode_index_slice( int16_t index, int16_t *ratio_idx_ism, int16_t n, int16_t K )
{ int16_t i, j, sum, base[MAX_NUM_OBJECTS], elem; switch ( n )
{ case 2: ratio_idx_ism[0] = index ; ratio_idx_ism[1 ] = K - ratio_idx_ism[0]; break; case 3: case 4:
{ j = 0; while ( index > 0 )
{ if ( valid( j, K, n - 1 ) )
{ index--;
} j++;
} base[0] = 1 ; for ( i = 1 ; i < n - 1 ; i++ )
{ base[i] = base[i - 1 ] * 10;
} sum = 0; for ( i = n - 2; i >= 0; i- )
{ elem = j I base[i]; ratio_idx_ism[n - i - 2] = elem; sum += elem; j -= elem * base[i];
} ratio_idx_ism[n - 1 ] = K - sum;
}
} default: break;
}
}
The decoded ISM vector values 606 can then be passed to an ISM ratio generator 607.
The metadata decoder 603 can in some embodiments comprise an ISM ratio generator 607 which is configured to receive the decoded ISM vector values 606 and generate decoded ISM ratios 608 in a manner employing the opposite methods to those described above.
The operation of generating the ISM ratios from the ISM vector values is shown in Figure 7 by step 707.
The ISM ratio values can then be output as shown in Figure 7 by step 709.
With respect to Figure 9 an example electronic device which may be used as any of the apparatus parts of the system as described above. The device may be any suitable electronics device or apparatus. For example, in some embodiments the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. The device may for example be configured to implement the encoder/analyser part and/or the decoder part as shown in Figure 1 or any functional block as described above.
In some embodiments the device 1400 comprises at least one processor or central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1400 comprises at least one memory 1411. In some embodiments the at least one processor 1407 is coupled to the memory 1411. The memory 1411 can be any suitable storage means. In some embodiments the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore, in some embodiments the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling. In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating.
In some embodiments the device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable radio access architecture based on long term evolution advanced (LTE Advanced, LTE-A) or new radio (NR) (or can be referred to as 5G), universal mobile telecommunications system (UMTS) radio access network (UTRAN or E-UTRAN), long term evolution (LTE, the same as E-UTRA), 2G networks (legacy network technology), wireless local area network (WLAN or Wi-Fi), worldwide interoperability for microwave access (WiMAX), Bluetooth®, personal communications services (PCS), ZigBee®, wideband code division multiple access (WCDMA), systems using ultra-wideband (UWB) technology, sensor networks, mobile ad-hoc networks (MANETs), cellular internet of things (loT) RAN and Internet Protocol multimedia subsystems (IMS), any other suitable option and/or any combination thereof.
The transceiver input/output port 1409 may be configured to receive the signals.
In some embodiments the device 1400 may be employed as at least part of the synthesis device. The input/output port 1409 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar and loudspeakers.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and
(b) combinations of hardware circuits and software, such as (as applicable):
(i) a combination of analog and/or digital hardware circuit(s) with software/firmware and
(ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device. The term “non-transitory,” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal ) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).
As used herein, “at least one of the following: <a list of two or more elements>” and “at least one of <a list of two or more elements>” and similar wording, where the list of two or more elements are joined by “and” or “or”, mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements
The foregoing description has provided by way of exemplary and nonlimiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS:
1. An apparatus, for encoding an audio object parameter; the apparatus comprising means for: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
2. The apparatus as claimed in claim 1 , wherein the means for generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects is for: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
3. The apparatus as claimed in any of claims 1 or 2, wherein the means for generating the vector from the selection of the quantized ratio parameters is for generating the vector from the selection of all but one of the quantized ratio parameters.
4. The apparatus as claimed in claim 3, wherein the means for generating the vector from the selection of all but one of the quantized ratio parameters is for: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
5. The apparatus as claimed in any of claims 1 to 4 wherein the means for quantizing the ratio parameter with respect to the audio object using the first number of bits is for scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
6. The apparatus as claimed in claim 5, wherein the first number of bits is three and wherein the integer value is an integer value in base ten.
7. The apparatus as claimed in claim 6 when dependent on claim 2, wherein the valid vector is one of: a combination of vector element values is less than or equal to seven; or no element of the vector has a value which is greater than seven and the combination of vector element values is less than or equal to seven.
8. An apparatus, for decoding ratio parameters for audio objects, the apparatus comprising means for: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
9. The apparatus as claimed in claim 8, wherein the means for converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector is for: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
10. The apparatus as claimed in any of claims 8 or 9, wherein the means for regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters is for generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
11 . The apparatus as claimed in any of claims 8 to 10, wherein the means for dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of a total audio environment is for scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
12. The apparatus as claimed in claim 11 when dependent on claim 10, wherein the first number of bits is three, the expected sum value is seven and wherein the integer value is an integer value in base ten.
13. A method for encoding an audio object parameter; the method comprising: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
14. The method as claimed in claim 13, wherein generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects comprises: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
15. The method as claimed in any of claims 13 or 14, wherein generating the vector from the selection of the quantized ratio parameters comprises generating the vector from the selection of all but one of the quantized ratio parameters.
16. The method as claimed in claim 15, wherein generating the vector from the selection of all but one of the quantized ratio parameters comprises: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
17. The method as claimed in any of claims 13 to 16 wherein quantizing the ratio parameter with respect to the audio object using the first number of bits comprises scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
18. A method for decoding ratio parameters for audio objects, the method comprising: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
19. The method as claimed in claim 18, wherein converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector comprises: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
20. The method as claimed in any of claims 18 or 19, wherein regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters comprises generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
PCT/EP2023/080907 2022-11-29 2023-11-07 Parametric spatial audio encoding WO2024115052A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2217884.2 2022-11-29

Publications (1)

Publication Number Publication Date
WO2024115052A1 true WO2024115052A1 (en) 2024-06-06

Family

ID=

Similar Documents

Publication Publication Date Title
CN112639966A (en) Determination of spatial audio parameter coding and associated decoding
EP3874492B1 (en) Determination of spatial audio parameter encoding and associated decoding
EP4082009A1 (en) The merging of spatial audio parameters
EP4082010A1 (en) Combining of spatial audio parameters
CN114365218A (en) Determination of spatial audio parametric coding and associated decoding
WO2020260756A1 (en) Determination of spatial audio parameter encoding and associated decoding
WO2022223133A1 (en) Spatial audio parameter encoding and associated decoding
EP4320876A1 (en) Separating spatial audio objects
WO2024115052A1 (en) Parametric spatial audio encoding
US20240185869A1 (en) Combining spatial audio streams
WO2024115050A1 (en) Parametric spatial audio encoding
US20230335143A1 (en) Quantizing spatial audio parameters
US20240079014A1 (en) Transforming spatial audio parameters
WO2024115051A1 (en) Parametric spatial audio encoding
WO2022200666A1 (en) Combining spatial audio streams
EP3948861A1 (en) Determination of the significance of spatial audio parameters and associated encoding
WO2022129672A1 (en) Quantizing spatial audio parameters
EP4162487A1 (en) Spatial audio parameter encoding and associated decoding
WO2023084145A1 (en) Spatial audio parameter decoding
EP4285360A1 (en) Determination of spatial audio parameter encoding and associated decoding